-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check if schema is compatible in add_files
API
#907
Merged
+211
−164
Merged
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
21bcf02
schema check
sungwy 3269c14
Merge branch 'main' into add-files-check
sungwy 177edce
merge
sungwy e9004ae
fix
sungwy c33fe1c
Etc/UTC support
sungwy 2c95cd4
revert - not the right PR
sungwy 66a485c
Merge branch 'main' into add-files-check
sungwy 2641301
lint
sungwy 6c4f5d7
fix
sungwy dead345
adopt review suggestion - thank you Fokko!
sungwy File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understand is that now if we enable
downcast-ns-timestamp-to-us-on-write
, we allow user to add parquet files withTIMESTAMP_NANOS
type data. My concern here is that we may add parquet files that not align with spec, which states that timestamp/timstamptz type should map toTIMESTAMP_MICROS
. Shall we be more restrictive when checking the parquet file that will be directly added to the table?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the catch @HonahX - I think you are right. Adding a nanosecond timestamp file doesn't correctly allow Spark Iceberg to read the file and instead results in exceptions like:
I will make downcast_ns_timestamp_to_us_on_write an input argument to _check_schema_compatible, so that we can prevent nanoseconds timestamp types from being added through add_files, but can continue to support it being downcast in overwrite/append
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I think it is okay to be able to read more broadly. We do need tests to ensure that it works correctly. Looking at Arrow, there are already some physical types that we don't support (
date64
, etc). In Java, we do support reading Timestamps that are encoded usingINT96
, we should not produce them.