-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORC-1065: Fix IndexOutOfBoundsException in ReaderImpl.extractFileTail #979
Conversation
main branchorc/java/core/src/java/org/apache/orc/impl/ReaderImpl.java Lines 720 to 725 in 3a2cb60
branch-1.5orc/java/core/src/java/org/apache/orc/impl/ReaderImpl.java Lines 487 to 490 in 5f88704
|
We used Spark 3.2.0, Hive2.3.9, Orc 1.6.11, The current workaround is to add configuration in <property>
<name>hive.orc.cache.stripe.details.mem.size</name>
<value>0</value>
</property> HIVE_ORC_CACHE_STRIPE_DETAILS_MEMORY_SIZE("hive.orc.cache.stripe.details.mem.size", "256Mb",
new SizeValidator(), "Maximum size of orc splits cached in the client."), |
It seems that the PR doesn't pass the UTs. Could you check the UT failures?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test case for your code, @cxzl25 ?
ok, let me see how to add a ut to cover this case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @cxzl25 .
cc @pgaref and @williamhyun |
…979 ### What changes were proposed in this pull request? Use buffer limit as `readSize` to avoid `IndexOutOfBoundsException`. **main** https://github.com/apache/orc/blob/3a2cb60e4ab6af6305c351fbdb51b98f460f64a0/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L720-L725 **branch-1.5** https://github.com/apache/orc/blob/5f88704d9bd36fc55b57a60c2fbbd35980b1b7e5/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L487-L490 ### Why are the changes needed? ORC-251 remove `ReaderImpl.extractFileTail` ORC-685 Add `ReaderImpl.extractFileTail` back In ORC-685, file length is used as readsize, which causes that if the buffer is read from the cache, the use of length is incorrect, resulting in IndexOutOfBoundsException. ``` long readSize = fileLen != -1? fileLen: buffer.limit(); int psLen = buffer.get((int) (readSize-1)) & 0xff; ``` ``` Caused by: java.lang.IndexOutOfBoundsException at java.nio.Buffer.checkIndex(Buffer.java:540) at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139) at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:726) at org.apache.hadoop.hive.ql.io.orc.LocalCache.getAndValidate(LocalCache.java:103) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.getSplits(OrcInputFormat.java:798) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.runGetSplitsSync(OrcInputFormat.java:916) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.generateSplitWork(OrcInputFormat.java:885) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.scheduleSplits(OrcInputFormat.java:1759) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1703) ``` ### How was this patch tested? local test (cherry picked from commit f53b149) Signed-off-by: Dongjoon Hyun <[email protected]>
…979 ### What changes were proposed in this pull request? Use buffer limit as `readSize` to avoid `IndexOutOfBoundsException`. **main** https://github.com/apache/orc/blob/3a2cb60e4ab6af6305c351fbdb51b98f460f64a0/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L720-L725 **branch-1.5** https://github.com/apache/orc/blob/5f88704d9bd36fc55b57a60c2fbbd35980b1b7e5/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L487-L490 ### Why are the changes needed? ORC-251 remove `ReaderImpl.extractFileTail` ORC-685 Add `ReaderImpl.extractFileTail` back In ORC-685, file length is used as readsize, which causes that if the buffer is read from the cache, the use of length is incorrect, resulting in IndexOutOfBoundsException. ``` long readSize = fileLen != -1? fileLen: buffer.limit(); int psLen = buffer.get((int) (readSize-1)) & 0xff; ``` ``` Caused by: java.lang.IndexOutOfBoundsException at java.nio.Buffer.checkIndex(Buffer.java:540) at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139) at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:726) at org.apache.hadoop.hive.ql.io.orc.LocalCache.getAndValidate(LocalCache.java:103) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.getSplits(OrcInputFormat.java:798) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.runGetSplitsSync(OrcInputFormat.java:916) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ETLSplitStrategy.generateSplitWork(OrcInputFormat.java:885) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.scheduleSplits(OrcInputFormat.java:1759) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1703) ``` ### How was this patch tested? local test (cherry picked from commit f53b149) Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 546f72a) Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
Use buffer limit as
readSize
to avoidIndexOutOfBoundsException
.main
orc/java/core/src/java/org/apache/orc/impl/ReaderImpl.java
Lines 720 to 725 in 3a2cb60
branch-1.5
orc/java/core/src/java/org/apache/orc/impl/ReaderImpl.java
Lines 487 to 490 in 5f88704
Why are the changes needed?
ORC-251 remove
ReaderImpl.extractFileTail
ORC-685 Add
ReaderImpl.extractFileTail
backIn ORC-685, file length is used as readsize, which causes that if the buffer is read from the cache, the use of length is incorrect, resulting in IndexOutOfBoundsException.
How was this patch tested?
local test