Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix issue NVIDIA#43 (empty files creation) and improve reading/writin…
…g speed (NVIDIA#57) This commit fixes issue NVIDIA#43 (empty files created when invoking reshard_jsonl method at nemo_curator.utils.file_utils.py) by double-checking the files size after being generated, and deleting them with size zero. In addition to that, I have noticed there is no need to parse to JSON object the content of the different lines, which should be already in json format. By removing that extra-parsing, there is a significant speed up in the execution of this method. Signed-off-by: Miguel Martínez <[email protected]> Signed-off-by: Vibhu Jawa <[email protected]>
- Loading branch information