-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17378] [BUILD] Upgrade snappy-java to 1.1.2.6 #14958
Conversation
Looks OK for master/2.0, and even 1.6 as it's already on 1.1.2.1, and changes from 1.1.2.1 to 1.1.2.4 look like strict fixes/improvements. https://github.com/xerial/snappy-java/blob/master/Milestone.md |
Yep, builds in progress on our farm against the latest 1.6 code using 1.1.2.6 so will improve the PR should there be no problems |
Would be nice to have a test in Spark to see if the changes to snappy made any sense to Spark (not that I'm against -- just as a safety measure that it did improve things if possible). |
Good point, xerial/snappy-java@60cc0c2 shows the changes that were made, so we would have a checkMagicHeader equivalent added to the existing Spark tests, I'll look into it |
Test build #64932 has finished for PR 14958 at commit
|
I don't know that the bug that was fixed has manifested yet in Spark; certainly not in the unit tests. It seems like something that could affect some execution out in the wild. It seems reasonable to take the fix as I don't see a downside to it. I don't know if we need to reproduce this same bug; snappy tests have done that. |
Fair enough. It was just an idea that would make the upgrade even more super-needed. The issues that snappy has fixed could manifest in various ways in Spark and while doing so could uncover others. Yes, again, it could be a project on its own. Sorry for the noise. |
@srowen The bug does manifest in Spark. I've been running this upgrade in production for a few weeks with no issues. This should be merged IMO. |
I'm almost done with adding in the test case anyway, involved a quick rewrite to use Scala code, had fun rewriting the for loop for the test into a while, I think it would manifest itself if your very first byte happened to be -126 and you're using Snappy. At the very least I brushed up on my Scala skills and I think it's good practice to add a test with an implementation change so I'll update the PR once it's done. If Snappy were to change (e.g. 1.1.3 were to be released but the magic header logic changes) we'd at least be another project that picks up the problem. |
Yeah I'm cool to merge this now, but if you have a test, OK. Is it testing Spark or just testing Snappy though? |
Probably needs TLC with the while loop, this is based on https://github.com/xerial/snappy-java/blob/60cc0c2e1d1a76ae2981d0572a5164fcfdfba5f1/src/test/java/org/xerial/snappy/SnappyInputStreamTest.java but outside of the snappy package so we can't use MAGIC_HEADER[0].
Straight up rewrite so it's just testing that the Snappy version we include happens to have the magic header handling in, kept it simple |
var buf = new Array[Byte](4096) | ||
|
||
var readBytes = 0 | ||
while (readBytes != -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop makes me weep, it's an attempt @ rewriting the following but in Scala
for (int readBytes = 0; (readBytes = input.read(buf)) != -1; ) {
Test build #65000 has finished for PR 14958 at commit
|
test("SPARK 17378: snappy-java should handle magic header when reading stream") { | ||
val b = new ByteArrayOutputStream() | ||
// Write uncompressed length beginning with -126 (the same with magicheader[0]) | ||
b.write(-126) // Can't access magic header[0] as it isn't public, so access this way |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this seems to tie this test to this internal detail of Snappy though. Spark itself doesn't really need to assert this detail in a test.
I feel like this test is just testing snappy, which snappy can test. I could see testing a case at the level of Spark that triggers this bug and verifies it's fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree, so how about we just revert the test case commit here and merge the 1.1.2.6 change itself as folks want it, and then in a later PR add an extra test for robustness if we want to.
Was a duplicate of the Snappy test instead of exercising the problem from Spark, something to think about later
Test build #65001 has finished for PR 14958 at commit
|
## What changes were proposed in this pull request? Upgrades the Snappy version to 1.1.2.6 from 1.1.2.4, release notes: https://github.com/xerial/snappy-java/blob/master/Milestone.md mention "Fix a bug in SnappyInputStream when reading compressed data that happened to have the same first byte with the stream magic header (#142)" ## How was this patch tested? Existing unit tests using the latest IBM Java 8 on Intel, Power and Z architectures (little and big-endian) Author: Adam Roberts <[email protected]> Closes #14958 from a-roberts/master. (cherry picked from commit 6c08dbf) Signed-off-by: Sean Owen <[email protected]>
Upgrades the Snappy version to 1.1.2.6 from 1.1.2.4, release notes: https://github.com/xerial/snappy-java/blob/master/Milestone.md mention "Fix a bug in SnappyInputStream when reading compressed data that happened to have the same first byte with the stream magic header (#142)" Existing unit tests using the latest IBM Java 8 on Intel, Power and Z architectures (little and big-endian) Author: Adam Roberts <[email protected]> Closes #14958 from a-roberts/master. (cherry picked from commit 6c08dbf) Signed-off-by: Sean Owen <[email protected]>
Merged to master/2.0/1.6 |
Upgrades the Snappy version to 1.1.2.6 from 1.1.2.4, release notes: https://github.com/xerial/snappy-java/blob/master/Milestone.md mention "Fix a bug in SnappyInputStream when reading compressed data that happened to have the same first byte with the stream magic header (apache#142)" Existing unit tests using the latest IBM Java 8 on Intel, Power and Z architectures (little and big-endian) Author: Adam Roberts <[email protected]> Closes apache#14958 from a-roberts/master. (cherry picked from commit 6c08dbf) Signed-off-by: Sean Owen <[email protected]> (cherry picked from commit af8e097)
Can you take a look at 1.6 build (https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-1.6-lint/262/console)? Seems the 1.6 build is broken by this pr. |
What changes were proposed in this pull request?
Upgrades the Snappy version to 1.1.2.6 from 1.1.2.4, release notes: https://github.com/xerial/snappy-java/blob/master/Milestone.md mention "Fix a bug in SnappyInputStream when reading compressed data that happened to have the same first byte with the stream magic header (#142)"
How was this patch tested?
Existing unit tests using the latest IBM Java 8 on Intel, Power and Z architectures (little and big-endian)