Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17378] [BUILD] Upgrade snappy-java to 1.1.2.6 #14958

Closed
wants to merge 8 commits into from
Closed

[SPARK-17378] [BUILD] Upgrade snappy-java to 1.1.2.6 #14958

wants to merge 8 commits into from

Conversation

a-roberts
Copy link
Contributor

What changes were proposed in this pull request?

Upgrades the Snappy version to 1.1.2.6 from 1.1.2.4, release notes: https://github.com/xerial/snappy-java/blob/master/Milestone.md mention "Fix a bug in SnappyInputStream when reading compressed data that happened to have the same first byte with the stream magic header (#142)"

How was this patch tested?

Existing unit tests using the latest IBM Java 8 on Intel, Power and Z architectures (little and big-endian)

@srowen
Copy link
Member

srowen commented Sep 5, 2016

Looks OK for master/2.0, and even 1.6 as it's already on 1.1.2.1, and changes from 1.1.2.1 to 1.1.2.4 look like strict fixes/improvements. https://github.com/xerial/snappy-java/blob/master/Milestone.md

@a-roberts
Copy link
Contributor Author

Yep, builds in progress on our farm against the latest 1.6 code using 1.1.2.6 so will improve the PR should there be no problems

@jaceklaskowski
Copy link
Contributor

Would be nice to have a test in Spark to see if the changes to snappy made any sense to Spark (not that I'm against -- just as a safety measure that it did improve things if possible).

@a-roberts
Copy link
Contributor Author

Good point, xerial/snappy-java@60cc0c2 shows the changes that were made, so we would have a checkMagicHeader equivalent added to the existing Spark tests, I'll look into it

@SparkQA
Copy link

SparkQA commented Sep 5, 2016

Test build #64932 has finished for PR 14958 at commit 3795997.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Sep 6, 2016

I don't know that the bug that was fixed has manifested yet in Spark; certainly not in the unit tests. It seems like something that could affect some execution out in the wild. It seems reasonable to take the fix as I don't see a downside to it. I don't know if we need to reproduce this same bug; snappy tests have done that.

@jaceklaskowski
Copy link
Contributor

Fair enough. It was just an idea that would make the upgrade even more super-needed. The issues that snappy has fixed could manifest in various ways in Spark and while doing so could uncover others. Yes, again, it could be a project on its own. Sorry for the noise.

@markhamstra
Copy link
Contributor

@srowen The bug does manifest in Spark. I've been running this upgrade in production for a few weeks with no issues. This should be merged IMO.

@a-roberts
Copy link
Contributor Author

I'm almost done with adding in the test case anyway, involved a quick rewrite to use Scala code, had fun rewriting the for loop for the test into a while, I think it would manifest itself if your very first byte happened to be -126 and you're using Snappy.

At the very least I brushed up on my Scala skills and I think it's good practice to add a test with an implementation change so I'll update the PR once it's done. If Snappy were to change (e.g. 1.1.3 were to be released but the magic header logic changes) we'd at least be another project that picks up the problem.

@srowen
Copy link
Member

srowen commented Sep 6, 2016

Yeah I'm cool to merge this now, but if you have a test, OK. Is it testing Spark or just testing Snappy though?

@a-roberts
Copy link
Contributor Author

Straight up rewrite so it's just testing that the Snappy version we include happens to have the magic header handling in, kept it simple

var buf = new Array[Byte](4096)

var readBytes = 0
while (readBytes != -1) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop makes me weep, it's an attempt @ rewriting the following but in Scala

for (int readBytes = 0; (readBytes = input.read(buf)) != -1; ) {

@SparkQA
Copy link

SparkQA commented Sep 6, 2016

Test build #65000 has finished for PR 14958 at commit accc88f.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

test("SPARK 17378: snappy-java should handle magic header when reading stream") {
val b = new ByteArrayOutputStream()
// Write uncompressed length beginning with -126 (the same with magicheader[0])
b.write(-126) // Can't access magic header[0] as it isn't public, so access this way
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this seems to tie this test to this internal detail of Snappy though. Spark itself doesn't really need to assert this detail in a test.

I feel like this test is just testing snappy, which snappy can test. I could see testing a case at the level of Spark that triggers this bug and verifies it's fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree, so how about we just revert the test case commit here and merge the 1.1.2.6 change itself as folks want it, and then in a later PR add an extra test for robustness if we want to.

Was a duplicate of the Snappy test instead of exercising the problem from Spark, something to think about later
@SparkQA
Copy link

SparkQA commented Sep 6, 2016

Test build #65001 has finished for PR 14958 at commit 022cad7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Sep 6, 2016
## What changes were proposed in this pull request?

Upgrades the Snappy version to 1.1.2.6 from 1.1.2.4, release notes: https://github.com/xerial/snappy-java/blob/master/Milestone.md mention "Fix a bug in SnappyInputStream when reading compressed data that happened to have the same first byte with the stream magic header (#142)"

## How was this patch tested?
Existing unit tests using the latest IBM Java 8 on Intel, Power and Z architectures (little and big-endian)

Author: Adam Roberts <[email protected]>

Closes #14958 from a-roberts/master.

(cherry picked from commit 6c08dbf)
Signed-off-by: Sean Owen <[email protected]>
@asfgit asfgit closed this in 6c08dbf Sep 6, 2016
asfgit pushed a commit that referenced this pull request Sep 6, 2016
Upgrades the Snappy version to 1.1.2.6 from 1.1.2.4, release notes: https://github.com/xerial/snappy-java/blob/master/Milestone.md mention "Fix a bug in SnappyInputStream when reading compressed data that happened to have the same first byte with the stream magic header (#142)"

Existing unit tests using the latest IBM Java 8 on Intel, Power and Z architectures (little and big-endian)

Author: Adam Roberts <[email protected]>

Closes #14958 from a-roberts/master.

(cherry picked from commit 6c08dbf)
Signed-off-by: Sean Owen <[email protected]>
@srowen
Copy link
Member

srowen commented Sep 6, 2016

Merged to master/2.0/1.6

zzcclp pushed a commit to zzcclp/spark that referenced this pull request Sep 7, 2016
Upgrades the Snappy version to 1.1.2.6 from 1.1.2.4, release notes: https://github.com/xerial/snappy-java/blob/master/Milestone.md mention "Fix a bug in SnappyInputStream when reading compressed data that happened to have the same first byte with the stream magic header (apache#142)"

Existing unit tests using the latest IBM Java 8 on Intel, Power and Z architectures (little and big-endian)

Author: Adam Roberts <[email protected]>

Closes apache#14958 from a-roberts/master.

(cherry picked from commit 6c08dbf)
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit af8e097)
@yhuai
Copy link
Contributor

yhuai commented Sep 7, 2016

Using `mvn` from path: /home/jenkins/workspace/spark-branch-1.6-lint/build/apache-maven-3.3.9/bin/mvn
Spark's published dependencies DO NOT MATCH the manifest file (dev/spark-deps).
To update the manifest file, run './dev/test-dependencies.sh --replace-manifest'.
diff --git a/dev/deps/spark-deps-hadoop-1 b/dev/pr-deps/spark-deps-hadoop-1
index dd5a6dc..a97b10c 100644
--- a/dev/deps/spark-deps-hadoop-1
+++ b/dev/pr-deps/spark-deps-hadoop-1
@@ -143,7 +143,7 @@ servlet-api-2.5.jar
 slf4j-api-1.7.10.jar
 slf4j-log4j12-1.7.10.jar
 snappy-0.2.jar
-snappy-java-1.1.2.1.jar
+snappy-java-1.1.2.6.jar
 spire-macros_2.10-0.7.4.jar
 spire_2.10-0.7.4.jar
 stax-api-1.0.1.jar
Using `mvn` from path: /home/jenkins/workspace/spark-branch-1.6-lint/build/apache-maven-3.3.9/bin/mvn
Build step 'Execute shell' marked build as failure
Finished: FAILURE

Can you take a look at 1.6 build (https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-1.6-lint/262/console)? Seems the 1.6 build is broken by this pr.

@srowen
Copy link
Member

srowen commented Sep 7, 2016

@yhuai ack OK I'll fix that. I must have managed to miss something in the deps file when I resolved conflicts on the cherry pick. It can't be complex to touch up though. Edit: ahh, right. We have a Hadoop 1 deps file in branch-1.6. Fixing it in #14992

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants