[SPARK-16922] [SPARK-17211] [SQL] make the address of values portable in LongToUnsafeRowMap #14927

davies · 2016-09-01T20:59:09Z

What changes were proposed in this pull request?

In LongToUnsafeRowMap, we use offset of a value as pointer, stored in a array also in the page for chained values. The offset is not portable, because Platform.LONG_ARRAY_OFFSET will be different with different JVM Heap size, then the deserialized LongToUnsafeRowMap will be corrupt.

This PR will change to use portable address (without Platform.LONG_ARRAY_OFFSET).

How was this patch tested?

Added a test case with random generated keys, to improve the coverage. But this test is not a regression test, that could require a Spark cluster that have at least 32G heap in driver or executor.

davies · 2016-09-01T21:01:30Z

cc @rxin @sitalkedia

hvanhovell · 2016-09-01T21:52:46Z

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala

@@ -448,7 +448,7 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
  private def nextSlot(pos: Int): Int = (pos + 2) & mask

  private def getRow(address: Long, resultRow: UnsafeRow): UnsafeRow = {
-    val offset = address >>> SIZE_BITS
+    val offset = (address >>> SIZE_BITS) + Platform.LONG_ARRAY_OFFSET


This may sound a bit redundant. How about creating two methods:

private[this] def toOffset(address: Long): Long = (address >>> SIZE_BITS) + Platform.LONG_ARRAY_OFFSET private[this] def toAdress(offset: Long): Long = (offset - Platform.LONG_ARRAY_OFFSET) << SIZE_BITS

To make it clearer what you are doing there. They should be inlined, so the performance overhead is minimal.

SparkQA · 2016-09-01T23:01:49Z

Test build #64804 has finished for PR 14927 at commit 0c8450c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-09-02T01:04:04Z

Test build #64809 has finished for PR 14927 at commit 09e1c89.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-09-02T16:35:51Z

@davies are we making this assumption anywhere else in our unsafe code?

davies · 2016-09-06T17:45:41Z

@hvanhovell Has lgtm offline. Other contributors also confirm that this patch fix the bug, I'm going to merge this one into master and 2.0 branch.

… in LongToUnsafeRowMap ## What changes were proposed in this pull request? In LongToUnsafeRowMap, we use offset of a value as pointer, stored in a array also in the page for chained values. The offset is not portable, because Platform.LONG_ARRAY_OFFSET will be different with different JVM Heap size, then the deserialized LongToUnsafeRowMap will be corrupt. This PR will change to use portable address (without Platform.LONG_ARRAY_OFFSET). ## How was this patch tested? Added a test case with random generated keys, to improve the coverage. But this test is not a regression test, that could require a Spark cluster that have at least 32G heap in driver or executor. Author: Davies Liu <[email protected]> Closes #14927 from davies/longmap. (cherry picked from commit f7e26d7) Signed-off-by: Davies Liu <[email protected]>

hvanhovell · 2016-09-06T21:24:18Z

LGTM

… in LongToUnsafeRowMap ## What changes were proposed in this pull request? In LongToUnsafeRowMap, we use offset of a value as pointer, stored in a array also in the page for chained values. The offset is not portable, because Platform.LONG_ARRAY_OFFSET will be different with different JVM Heap size, then the deserialized LongToUnsafeRowMap will be corrupt. This PR will change to use portable address (without Platform.LONG_ARRAY_OFFSET). ## How was this patch tested? Added a test case with random generated keys, to improve the coverage. But this test is not a regression test, that could require a Spark cluster that have at least 32G heap in driver or executor. Author: Davies Liu <[email protected]> Closes apache#14927 from davies/longmap.

make the address of values portable

0c8450c

hvanhovell reviewed Sep 1, 2016
View reviewed changes

address comments

09e1c89

asfgit closed this in f7e26d7 Sep 6, 2016

rishitesh mentioned this pull request Sep 26, 2016

[SNAP-1053][SPARK-16922] [SPARK-17211] [SQL] make the address of values portable… TIBCOSoftware/snappy-spark#13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-16922] [SPARK-17211] [SQL] make the address of values portable in LongToUnsafeRowMap #14927

[SPARK-16922] [SPARK-17211] [SQL] make the address of values portable in LongToUnsafeRowMap #14927

davies commented Sep 1, 2016

davies commented Sep 1, 2016

hvanhovell Sep 1, 2016 •

edited

Loading

SparkQA commented Sep 1, 2016

SparkQA commented Sep 2, 2016

hvanhovell commented Sep 2, 2016

davies commented Sep 6, 2016

hvanhovell commented Sep 6, 2016

[SPARK-16922] [SPARK-17211] [SQL] make the address of values portable in LongToUnsafeRowMap #14927

[SPARK-16922] [SPARK-17211] [SQL] make the address of values portable in LongToUnsafeRowMap #14927

Conversation

davies commented Sep 1, 2016

What changes were proposed in this pull request?

How was this patch tested?

davies commented Sep 1, 2016

hvanhovell Sep 1, 2016 • edited Loading

Choose a reason for hiding this comment

SparkQA commented Sep 1, 2016

SparkQA commented Sep 2, 2016

hvanhovell commented Sep 2, 2016

davies commented Sep 6, 2016

hvanhovell commented Sep 6, 2016

hvanhovell Sep 1, 2016 •

edited

Loading