add a stamp/handshaking to let automated client know the data for a new relase has finished uploading #161

hroongtatrip · 2024-05-22T14:37:15Z

While each release of overture data in s3 is under a dated folder-version for ex 2024-05-16-beta.0/
the contents of the parquet file get uploaded at diff time
It would be helpful to have a "DONE" signal in form of a empty file so that clients can trigger ingesting the data.

jwass · 2024-06-06T19:47:10Z

@hroongtatrip I know that Spark or Hadoop can write _success files which are empty but are created once all files are written. We could do something similar I'd think.

hroongtatrip · 2024-06-06T19:51:33Z

yes that would work. Just some singal in form of a 0 byte file when all is done. Thanks

jwass · 2024-06-06T19:54:44Z

@varapmsft @ibnt1 Any thoughts here? I think Spark can be configured to write this file automatically. Or we could do it manually. We'd have to be careful if an entire dataset is copied that it's written once all other files are.

jwass · 2024-06-06T21:44:52Z

@ibnt1 pointed out that some tools might crash on the presence of additional _success (and similar) files. We should at least check: Athena, Duckdb, pyarrow

@jenningsanderson made a simple script https://github.com/OvertureMaps/data/blob/main/utils/fetch-releases-from-s3.py whose result can be published. Then you check that file periodically for new data. This would allow new data to land in the right folder and for some testing to take place before publishing that file as "officially" released.

hroongtatrip · 2024-06-06T22:24:00Z

the _sucess does not need to be in same folder. Is that the other script does? if so that would work too.

bglazer-meta assigned jwass Jun 6, 2024

atiannicelli added the enhancement New feature or request label Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add a stamp/handshaking to let automated client know the data for a new relase has finished uploading #161

add a stamp/handshaking to let automated client know the data for a new relase has finished uploading #161

hroongtatrip commented May 22, 2024

jwass commented Jun 6, 2024

hroongtatrip commented Jun 6, 2024

jwass commented Jun 6, 2024

jwass commented Jun 6, 2024

hroongtatrip commented Jun 6, 2024

add a stamp/handshaking to let automated client know the data for a new relase has finished uploading #161

add a stamp/handshaking to let automated client know the data for a new relase has finished uploading #161

Comments

hroongtatrip commented May 22, 2024

jwass commented Jun 6, 2024

hroongtatrip commented Jun 6, 2024

jwass commented Jun 6, 2024

jwass commented Jun 6, 2024

hroongtatrip commented Jun 6, 2024