-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Skip empty JSON files in read_json()
#47378
[Data] Skip empty JSON files in read_json()
#47378
Conversation
4fe6fd9
to
430ea5e
Compare
Tested the changes in local Before the change
After the change
|
@scottjlee , Please review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you also add a unit test in test_json.py
to test the change in this PR?
you can follow the logic in the reproducible example from the original issue. thanks!
@@ -101,6 +101,11 @@ def _read_with_python_json(self, buffer: "pyarrow.lib.Buffer"): | |||
|
|||
import pyarrow as pa | |||
|
|||
# Check if the buffer is empty | |||
if buffer.size == 0: | |||
yield pa.Table.from_pylist([]) # Yield an empty PyArrow Table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the case of an empty file, i think we can simply return
out, instead of yielding an empty table. but let's confirm with a unit test.
Signed-off-by: venkatram-dev <[email protected]>
Signed-off-by: venkatram-dev <[email protected]>
aa159ff
to
8cf0444
Compare
Signed-off-by: venkatram-dev <[email protected]>
@scottjlee , Added Unit test to read from a file path containing both empty file and non empty file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the fix!
read_json()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm, thanks!
## Why are these changes needed? Skip empty files and do not raise json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) ## Related issue number Closes ray-project#47198 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: venkatram-dev <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
Why are these changes needed?
#47198
Skip empty files and do not raise json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Related issue number
#47198
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.