Fixing bug where large index files weren't being read fully #489

alex-shchetkov · 2021-08-10T23:07:30Z

What is the context for this pull request?

I ran into an issue where I was unable to use any of the created indexes, due to a Json Parser claiming it encountered invalid chars.

This was misleading, because the actual issue was that only a portion of the index file was being read.

Tracking Issue: N/A
Parent Issue: N/A
Dependencies: N/A

What changes were proposed in this pull request?

Changing the FileSystem.read() to a FileSystem.readFully().
This is because using .read() does not always read in the full file.

This bug fix very likely fixes these:
#431
#373
#297 (comment) (point #2)

Does this PR introduce any user-facing change?

No

How was this patch tested?

I compiled/packaged the code and ran it on an EMR (spark 3.1) cluster to generate a relatively large (8MB in my case) index file in an s3 location
With this change I was able to use the index to run a query.

…t being read fully

sezruby

Thanks for providing the fix!

changing from read to readFully to fix the bug where index file wasn'…

56f15e4

…t being read fully

sezruby assigned alex-shchetkov Aug 11, 2021

sezruby requested review from clee704 and imback82 August 11, 2021 07:31

sezruby approved these changes Aug 11, 2021

View reviewed changes

sezruby added this to the v0.5.0 milestone Aug 11, 2021

sezruby added the bug Something isn't working label Aug 11, 2021

sezruby merged commit c2f4f04 into microsoft:master Aug 11, 2021

This was referenced Aug 11, 2021

No module named Hyperspace on AWS EMR #297

Closed

JsonMappingException on index create and read #373

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing bug where large index files weren't being read fully #489

Fixing bug where large index files weren't being read fully #489

alex-shchetkov commented Aug 10, 2021

sezruby left a comment

Fixing bug where large index files weren't being read fully #489

Fixing bug where large index files weren't being read fully #489

Conversation

alex-shchetkov commented Aug 10, 2021

What is the context for this pull request?

What changes were proposed in this pull request?

Does this PR introduce any user-facing change?

How was this patch tested?

sezruby left a comment

Choose a reason for hiding this comment