Quote ingest using apache stack: arrow / parquet #536

goodboy · 2023-10-31T15:11:30Z

In Follow up to #486, it'd sure be nice to be able to move away
from our current multiprocessing.shared_memory approach for
real-time quote/tick ingest and possibly leverage an apache
standard format such as arrow and parquet.

As part of improving the .parquet file based tsdb IO from #486
obviously it'd be ideal to support df appends instead of only full
overwrites 😂.

ToDo content from #486

pertaining to StorageClient.write_ohlcv() write on backfills and
rt ingest. rn the write is masked out mostly bc there's some
details to work out on when/how frequent the writes to parquet
files should happen, particularly whether to "append" to parquet
files: turns out there's options for appending (faster then
overwriting i guess?) to parquet, particularly using fastparquet,
see the below resources:

for python we can likely use: https://fastparquet.readthedocs.io/en/latest/api.html#fastparquet.write
- also note the times options with the int96 format which
  embeds nanoseconds B)
- the custom_metadata: dict can only be used on overwrite 👀
  - can use the https://fastparquet.readthedocs.io/en/latest/api.html#fastparquet.update_file_custom_metadata
    to update metadata if needed?
https://stackoverflow.com/questions/39234391/how-to-append-data-to-an-existing-parquet-file
https://stackoverflow.com/questions/47191675/pandas-write-dataframe-to-parquet-format-with-append/74209756#74209756
other langs and spark related:

The text was updated successfully, but these errors were encountered:

goodboy mentioned this issue Oct 31, 2023

Start piker.storage subsys: cross-(ts)db middlewares #486

Open

34 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quote ingest using apache stack: arrow / parquet #536

Quote ingest using apache stack: arrow / parquet #536

goodboy commented Oct 31, 2023 •

edited

Loading

Quote ingest using apache stack: arrow / parquet #536

Quote ingest using apache stack: arrow / parquet #536

Comments

goodboy commented Oct 31, 2023 • edited Loading

ToDo content from #486

goodboy commented Oct 31, 2023 •

edited

Loading