Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix batching on ParquetFileReader #103

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jakedorne
Copy link

fixes #100

Currently, the parquet file batcher calls hasNext while seeking the file, which itself checks if seeked == true. This leads to the filereader repeatedly reading the second batch and never completes. Using the existing hasNextRecord fixes this and I assume was originally intended to be used here.

This PR doesn't contain tests, sorry. To reproduce this in tests I had to replace the mocking with stubs, which broke other tests and fixing it would be a bigger change than I think this fix warrants. Here is a commit showing what I did to reproduce.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ParquetFileReader - Maximum of two batches processed using file_reader.batch_size set to > 0
1 participant