-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORC-1059: Align findColumns behaviour between 1.6 and 1.7 release #972
Conversation
Change-Id: I3687491a8e430609374ba259d721e98bf4359ba8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @pgaref .
When trying to bump Apache Hive to 1.7 I noticed query failures related to SArgs pushdown. In more detail, Hive may push Partition column filters that are not part of the columns, until 1.6 recently these columns were ignored, while 1.7 throws IllegalArgumentException. Align findColumns behaviour between 1.6 and 1.7 release Existing tests (cherry picked from commit aee8132) Signed-off-by: Dongjoon Hyun <[email protected]>
I cherry-picked this to branch-1.7 for Apache ORC 1.7.2 release. |
Thanks for the quick review @dongjoon-hyun ! I suggest we keep this in mind for 1.8 (and switch back to that behaviour) where we could align downstream consumers like Hive to only push SArgs for columns that are part of the schema. |
Is it okay when we handle multiple ORC files in Hive schema evolutions?
In many cases, Hive partitions might have different schemas. The simplest case is having new columns additionally in new partitions. If a user run a query for all partitions, SArg columns can have new columns which old partitions don't have. Apache Spark checks the physical schema when we open a file and try to adjust the missing columns. Given this PR's description, Apache Hive doesn't, right? For me, throwing an exception could be too intrusive and the AS-IS status ( |
BTW, let me try to test the status of |
What changes were proposed in this pull request?
When trying to bump Apache Hive to 1.7 I noticed query failures related to SArgs pushdown.
In more detail, Hive may push Partition column filters that are not part of the columns, until 1.6 recently these columns were ignored, while 1.7 throws IllegalArgumentException. ORC-741 introduced this behavior change in Apache ORC 1.7.0.
Why are the changes needed?
Align findColumns behaviour between 1.6 and 1.7 release
How was this patch tested?
Existing tests