You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
where "my_files" is an array of paths to parquet files that were from a dask dataframe. It seems to be loading everything into memory. I'd like a way to process out-of-core, similar to dask, I was under the impression this was a goal for DTable. Thanks.
The text was updated successfully, but these errors were encountered:
You can give JuliaData/MemPool.jl#60 a try, which is my new WIP approach to swap-to-disk (just set the env. var. JULIA_MEMPOOL_EXPERIMENTAL_FANCY_ALLOCATOR=1 to enable it). I will warn you that it's not ready yet:
Performance of swapped-out data reads is currently bad, due to not properly migrating data back to memory (instead reading from disk for every read)
The memory usage limit is not yet tunable, and defaults to 8GB
The disk usage limit is currently unbounded, and will use all of your disk space if you allocate too much (everything will be stored in .mempool relative to your current working directory, if you need to manually delete those files)
I plan to begin DTable testing of that PR soon, but haven't yet had the chance to get to it, but do feel free to give it a spin! I'll let you know once I've fixed the above issues.
I tried
tbl = Dagger.DTable(Parquet.read_parquet, my_files)
where "my_files" is an array of paths to parquet files that were from a dask dataframe. It seems to be loading everything into memory. I'd like a way to process out-of-core, similar to dask, I was under the impression this was a goal for DTable. Thanks.
The text was updated successfully, but these errors were encountered: