Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parquet: add method to get both the inner writer and the file metadata when closing SerializedFileWriter #5253

Closed
conradludgate opened this issue Dec 29, 2023 · 2 comments · Fixed by #5471
Labels
enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate

Comments

@conradludgate
Copy link
Contributor

conradludgate commented Dec 29, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

I want to access the FileMetadata from a closed parquet file so that I can add some logging, but I also need to access the inner writer for further processing.

Describe the solution you'd like

SerializedFileWriter offers

  • into_inner() -> Result<W>
  • close() -> Result<FileMetadata>.

The bodies of both functions are almost identical. Perhaps close can return Result<(FileMetadata, W)>.

Describe alternatives you've considered

For now, I will use into_inner() and then open the file with SerializedFileReader to get the metadata.

Additional context

  1. close() does not flush the file, which will ignore errors.
  2. I would like async support, but I don't want to go through arrow. For now I am writing to an in memory buffer and then flushing the buffer over the network after I close the file.
  3. TrackedWriter forces a BufWriter which means I am now double buffered. I would prefer if W: BufWriter instead of forcing the buffer on top.
@conradludgate conradludgate added the enhancement Any new improvement worthy of a entry in the changelog label Dec 29, 2023
@tustvold
Copy link
Contributor

Adding a finish method that returns both makes sense to me

@tustvold
Copy link
Contributor

label_issue.py automatically added labels {'parquet'} from #5471

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants