Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16922] [SPARK-17211] [SQL] make the address of values portable in LongToUnsafeRowMap #14927

Closed
wants to merge 2 commits into from

Conversation

davies
Copy link
Contributor

@davies davies commented Sep 1, 2016

What changes were proposed in this pull request?

In LongToUnsafeRowMap, we use offset of a value as pointer, stored in a array also in the page for chained values. The offset is not portable, because Platform.LONG_ARRAY_OFFSET will be different with different JVM Heap size, then the deserialized LongToUnsafeRowMap will be corrupt.

This PR will change to use portable address (without Platform.LONG_ARRAY_OFFSET).

How was this patch tested?

Added a test case with random generated keys, to improve the coverage. But this test is not a regression test, that could require a Spark cluster that have at least 32G heap in driver or executor.

@davies
Copy link
Contributor Author

davies commented Sep 1, 2016

cc @rxin @sitalkedia

@@ -448,7 +448,7 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
private def nextSlot(pos: Int): Int = (pos + 2) & mask

private def getRow(address: Long, resultRow: UnsafeRow): UnsafeRow = {
val offset = address >>> SIZE_BITS
val offset = (address >>> SIZE_BITS) + Platform.LONG_ARRAY_OFFSET
Copy link
Contributor

@hvanhovell hvanhovell Sep 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may sound a bit redundant. How about creating two methods:

private[this] def toOffset(address: Long): Long = (address >>> SIZE_BITS) + Platform.LONG_ARRAY_OFFSET
private[this] def toAdress(offset: Long): Long = (offset - Platform.LONG_ARRAY_OFFSET) << SIZE_BITS

To make it clearer what you are doing there. They should be inlined, so the performance overhead is minimal.

@SparkQA
Copy link

SparkQA commented Sep 1, 2016

Test build #64804 has finished for PR 14927 at commit 0c8450c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 2, 2016

Test build #64809 has finished for PR 14927 at commit 09e1c89.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

@davies are we making this assumption anywhere else in our unsafe code?

@davies
Copy link
Contributor Author

davies commented Sep 6, 2016

@hvanhovell Has lgtm offline. Other contributors also confirm that this patch fix the bug, I'm going to merge this one into master and 2.0 branch.

@asfgit asfgit closed this in f7e26d7 Sep 6, 2016
asfgit pushed a commit that referenced this pull request Sep 6, 2016
… in LongToUnsafeRowMap

## What changes were proposed in this pull request?

In LongToUnsafeRowMap, we use offset of a value as pointer, stored in a array also in the page for chained values. The offset is not portable, because Platform.LONG_ARRAY_OFFSET will be different with different JVM Heap size, then the deserialized LongToUnsafeRowMap will be corrupt.

This PR will change to use portable address (without Platform.LONG_ARRAY_OFFSET).

## How was this patch tested?

Added a test case with random generated keys, to improve the coverage. But this test is not a regression test, that could require a Spark cluster that have at least 32G heap in driver or executor.

Author: Davies Liu <[email protected]>

Closes #14927 from davies/longmap.

(cherry picked from commit f7e26d7)
Signed-off-by: Davies Liu <[email protected]>
@hvanhovell
Copy link
Contributor

LGTM

rishitesh pushed a commit to TIBCOSoftware/snappy-spark that referenced this pull request Sep 26, 2016
… in LongToUnsafeRowMap

## What changes were proposed in this pull request?

In LongToUnsafeRowMap, we use offset of a value as pointer, stored in a array also in the page for chained values. The offset is not portable, because Platform.LONG_ARRAY_OFFSET will be different with different JVM Heap size, then the deserialized LongToUnsafeRowMap will be corrupt.

This PR will change to use portable address (without Platform.LONG_ARRAY_OFFSET).

## How was this patch tested?

Added a test case with random generated keys, to improve the coverage. But this test is not a regression test, that could require a Spark cluster that have at least 32G heap in driver or executor.

Author: Davies Liu <[email protected]>

Closes apache#14927 from davies/longmap.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants