You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While each release of overture data in s3 is under a dated folder-version for ex 2024-05-16-beta.0/
the contents of the parquet file get uploaded at diff time
It would be helpful to have a "DONE" signal in form of a empty file so that clients can trigger ingesting the data.
The text was updated successfully, but these errors were encountered:
@hroongtatrip I know that Spark or Hadoop can write _success files which are empty but are created once all files are written. We could do something similar I'd think.
@varapmsft@ibnt1 Any thoughts here? I think Spark can be configured to write this file automatically. Or we could do it manually. We'd have to be careful if an entire dataset is copied that it's written once all other files are.
@ibnt1 pointed out that some tools might crash on the presence of additional _success (and similar) files. We should at least check: Athena, Duckdb, pyarrow
While each release of overture data in s3 is under a dated folder-version for ex 2024-05-16-beta.0/
the contents of the parquet file get uploaded at diff time
It would be helpful to have a "DONE" signal in form of a empty file so that clients can trigger ingesting the data.
The text was updated successfully, but these errors were encountered: