Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Enhancement the document for dataset and s3 #37241

Open
mapleFU opened this issue Aug 18, 2023 · 0 comments
Open

[Doc] Enhancement the document for dataset and s3 #37241

mapleFU opened this issue Aug 18, 2023 · 0 comments

Comments

@mapleFU
Copy link
Member

mapleFU commented Aug 18, 2023

Describe the enhancement requested

Currently, we has the issue below:

  1. [Python][Dataset][Parquet] Enable Pre-Buffering by default for Parquet s3 datasets #36765
  2. [Python] Read table stuck and hangs forever #37139

When reading parquet from s3, it's important to prefetch some file or row-groups. However, larger prefetch depth might cause more memory usage. And what make it even worse is that, interfaces like to_table and other reading might has different behaviors for prefetch. So we'd better and more document for it.

Component(s)

Documentation, Parquet, Python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant