-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16922] [SPARK-17211] [SQL] make the address of values portable in LongToUnsafeRowMap #14927
Conversation
cc @rxin @sitalkedia |
@@ -448,7 +448,7 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap | |||
private def nextSlot(pos: Int): Int = (pos + 2) & mask | |||
|
|||
private def getRow(address: Long, resultRow: UnsafeRow): UnsafeRow = { | |||
val offset = address >>> SIZE_BITS | |||
val offset = (address >>> SIZE_BITS) + Platform.LONG_ARRAY_OFFSET |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may sound a bit redundant. How about creating two methods:
private[this] def toOffset(address: Long): Long = (address >>> SIZE_BITS) + Platform.LONG_ARRAY_OFFSET
private[this] def toAdress(offset: Long): Long = (offset - Platform.LONG_ARRAY_OFFSET) << SIZE_BITS
To make it clearer what you are doing there. They should be inlined, so the performance overhead is minimal.
Test build #64804 has finished for PR 14927 at commit
|
Test build #64809 has finished for PR 14927 at commit
|
@davies are we making this assumption anywhere else in our unsafe code? |
@hvanhovell Has lgtm offline. Other contributors also confirm that this patch fix the bug, I'm going to merge this one into master and 2.0 branch. |
… in LongToUnsafeRowMap ## What changes were proposed in this pull request? In LongToUnsafeRowMap, we use offset of a value as pointer, stored in a array also in the page for chained values. The offset is not portable, because Platform.LONG_ARRAY_OFFSET will be different with different JVM Heap size, then the deserialized LongToUnsafeRowMap will be corrupt. This PR will change to use portable address (without Platform.LONG_ARRAY_OFFSET). ## How was this patch tested? Added a test case with random generated keys, to improve the coverage. But this test is not a regression test, that could require a Spark cluster that have at least 32G heap in driver or executor. Author: Davies Liu <[email protected]> Closes #14927 from davies/longmap. (cherry picked from commit f7e26d7) Signed-off-by: Davies Liu <[email protected]>
LGTM |
… in LongToUnsafeRowMap ## What changes were proposed in this pull request? In LongToUnsafeRowMap, we use offset of a value as pointer, stored in a array also in the page for chained values. The offset is not portable, because Platform.LONG_ARRAY_OFFSET will be different with different JVM Heap size, then the deserialized LongToUnsafeRowMap will be corrupt. This PR will change to use portable address (without Platform.LONG_ARRAY_OFFSET). ## How was this patch tested? Added a test case with random generated keys, to improve the coverage. But this test is not a regression test, that could require a Spark cluster that have at least 32G heap in driver or executor. Author: Davies Liu <[email protected]> Closes apache#14927 from davies/longmap.
What changes were proposed in this pull request?
In LongToUnsafeRowMap, we use offset of a value as pointer, stored in a array also in the page for chained values. The offset is not portable, because Platform.LONG_ARRAY_OFFSET will be different with different JVM Heap size, then the deserialized LongToUnsafeRowMap will be corrupt.
This PR will change to use portable address (without Platform.LONG_ARRAY_OFFSET).
How was this patch tested?
Added a test case with random generated keys, to improve the coverage. But this test is not a regression test, that could require a Spark cluster that have at least 32G heap in driver or executor.