Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quote ingest using apache stack: arrow / parquet #536

Open
4 tasks
goodboy opened this issue Oct 31, 2023 · 0 comments
Open
4 tasks

Quote ingest using apache stack: arrow / parquet #536

goodboy opened this issue Oct 31, 2023 · 0 comments
Labels
data-layer real-time and historical data processing and storage dependencies we are the dependent, or are you? fsp financial signal processing integration external stack and/or lib augmentations perf efficiency and latency optimization research probably just a link dump..

Comments

@goodboy
Copy link
Contributor

goodboy commented Oct 31, 2023

In Follow up to #486, it'd sure be nice to be able to move away
from our current multiprocessing.shared_memory approach for
real-time quote/tick ingest and possibly leverage an apache
standard format such as arrow and parquet.

As part of improving the .parquet file based tsdb IO from #486
obviously it'd be ideal to support df appends instead of only full
overwrites 😂.


ToDo content from #486

pertaining to StorageClient.write_ohlcv() write on backfills and
rt ingest. rn the write is masked out mostly bc there's some
details to work out on when/how frequent the writes to parquet
files should happen, particularly whether to "append" to parquet
files: turns out there's options for appending (faster then
overwriting i guess?) to parquet, particularly using fastparquet,
see the below resources:

@goodboy goodboy added dependencies we are the dependent, or are you? integration external stack and/or lib augmentations data-layer real-time and historical data processing and storage fsp financial signal processing research probably just a link dump.. perf efficiency and latency optimization labels Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-layer real-time and historical data processing and storage dependencies we are the dependent, or are you? fsp financial signal processing integration external stack and/or lib augmentations perf efficiency and latency optimization research probably just a link dump..
Projects
None yet
Development

No branches or pull requests

1 participant