Skip to content
This repository has been archived by the owner on Jun 14, 2024. It is now read-only.

Fixing bug where large index files weren't being read fully #489

Merged
merged 1 commit into from
Aug 11, 2021

Conversation

alex-shchetkov
Copy link
Contributor

What is the context for this pull request?

I ran into an issue where I was unable to use any of the created indexes, due to a Json Parser claiming it encountered invalid chars.

This was misleading, because the actual issue was that only a portion of the index file was being read.

  • Tracking Issue: N/A
  • Parent Issue: N/A
  • Dependencies: N/A

What changes were proposed in this pull request?

Changing the FileSystem.read() to a FileSystem.readFully().
This is because using .read() does not always read in the full file.

This bug fix very likely fixes these:
#431
#373
#297 (comment) (point #2)

Does this PR introduce any user-facing change?

No

How was this patch tested?

I compiled/packaged the code and ran it on an EMR (spark 3.1) cluster to generate a relatively large (8MB in my case) index file in an s3 location
With this change I was able to use the index to run a query.

Copy link
Collaborator

@sezruby sezruby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for providing the fix!

@sezruby sezruby added this to the v0.5.0 milestone Aug 11, 2021
@sezruby sezruby added the bug Something isn't working label Aug 11, 2021
@sezruby sezruby merged commit c2f4f04 into microsoft:master Aug 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants