-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12031][Core][BUG]: Integer overflow when do sampling #10023
Conversation
Jenkins, test this please |
val rand = new XORShiftRandom(seed) | ||
while (input.hasNext) { | ||
val item = input.next() | ||
val replacementIndex = rand.nextInt(i) | ||
val replacementIndex = l < Int.MaxValue match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the change is fine except that the case true
business strikes me as pointlessly verbose Scala. if else
is much clearer.
Test build #46836 has finished for PR 10023 at commit
|
[info] - failing to fetch classes from HTTP server should not leak resources (SPARK-6209) *** FAILED *** (1 second, 392 milliseconds) |
Test build #46837 has finished for PR 10023 at commit
|
val replacementIndex = if (l < Int.MaxValue) { | ||
rand.nextInt(l.toInt) | ||
} else { | ||
rand.nextLong() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't valid, because it chooses a value from (negative) Long.MIN_VALUE
to Long.MAX_VALUE
. You need to choose a number in [0, l)
.
Rather than even special case it, just use
val replacementIndex = (random.nextDouble() * l).toLong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
Test build #46838 has finished for PR 10023 at commit
|
Test build #46866 has finished for PR 10023 at commit
|
LGTM. CC @rxin for a look if possible |
val rand = new XORShiftRandom(seed) | ||
while (input.hasNext) { | ||
val item = input.next() | ||
val replacementIndex = rand.nextInt(i) | ||
val replacementIndex = (rand.nextDouble() * l).toLong |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @mengxr
@mengxr I'm going to merge tomorrow to get this fix in for 1.6 |
@uncleGen How many items did you have to trigger this error? We assume that there are less than |
@mengxr In my case, I do 10TB data sort in 96 partitions. When do range partition, "java.lang.IllegalArgumentException: n must be positive" exception was thrown. At that time, I saw the partition contained 10.3GB items, and the number of items exceeded the "Int.MaxValue" limit |
@mengxr sorry, was 48 executors with 2 cores, and 1024 partitions. |
@mengxr I'd like to merge this on the grounds that it can't hurt at the worst, and solves a problem that either is real now or could come up later. |
Test build #2181 has finished for PR 10023 at commit
|
Author: uncleGen <[email protected]> Closes #10023 from uncleGen/1.6-bugfix. (cherry picked from commit a113216) Signed-off-by: Sean Owen <[email protected]>
Merged to master/1.6 |
No description provided.