Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Correctly coerce Parquet Int96 timestamps into requested TimeUnits #1532

Merged

Conversation

jaychia
Copy link
Contributor

@jaychia jaychia commented Aug 7, 2023

Addresses the first part of issue #1527

Instead of always naively parsing Parquet Int96 timestamps into TimeUnit::Nanosecond, we match on the requested timeunit and perform timeunit-specific parsing

This makes parsing safer when reading Int96 timestamps that are outside of the range of timestamp[ns] (e.g. timestamps with dates like the years 1000 or 3000) instead of the current behavior which is to always parse the timestamps with the Nanosecond timeunit, and overflow.

@jaychia
Copy link
Contributor Author

jaychia commented Aug 8, 2023

Also happy to move the int96_to_i64_*s logic over into the parquet2 repo!

@ritchie46
Copy link
Collaborator

Also happy to move the int96_to_i64_*s

I believe that all arrow datatype related conversions are done here, so that is fine.

@codecov
Copy link

codecov bot commented Aug 8, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.66% ⚠️

Comparison is base (b09e580) 83.73% compared to head (b749735) 83.07%.
Report is 49 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1532      +/-   ##
==========================================
- Coverage   83.73%   83.07%   -0.66%     
==========================================
  Files         389      389              
  Lines       41811    42632     +821     
==========================================
+ Hits        35009    35417     +408     
- Misses       6802     7215     +413     
Files Changed Coverage Δ
src/io/parquet/read/deserialize/simple.rs 86.27% <100.00%> (+1.76%) ⬆️

... and 54 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jaychia jaychia requested a review from ritchie46 August 8, 2023 22:11
@jaychia
Copy link
Contributor Author

jaychia commented Aug 9, 2023

Second PR is up! #1533

@jaychia
Copy link
Contributor Author

jaychia commented Aug 23, 2023

@ritchie46 any chance you could retrigger CI - Seems to be failing on some flaky token issues?

{'detail': ErrorDetail(string='Unable to locate build via Github Actions API. Please upload with the Codecov repository upload token to resolve issue.', code='not_found')}

@jaychia
Copy link
Contributor Author

jaychia commented Sep 2, 2023

Thanks for the stamp @sundy-li!

I had to make a new PR to fix lints since they were re-enabled last week. Would appreciate some help with launching CI :)

@sundy-li
Copy link
Collaborator

sundy-li commented Sep 3, 2023

Merged, will create a new pr to fix the clippy.

@sundy-li sundy-li merged commit a913919 into jorgecarleitao:main Sep 3, 2023
24 of 25 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants