[SPARK-17378] [BUILD] Upgrade snappy-java to 1.1.2.6 #14958

a-roberts · 2016-09-05T08:56:20Z

What changes were proposed in this pull request?

Upgrades the Snappy version to 1.1.2.6 from 1.1.2.4, release notes: https://github.com/xerial/snappy-java/blob/master/Milestone.md mention "Fix a bug in SnappyInputStream when reading compressed data that happened to have the same first byte with the stream magic header (#142)"

How was this patch tested?

Existing unit tests using the latest IBM Java 8 on Intel, Power and Z architectures (little and big-endian)

Pom change

srowen · 2016-09-05T09:01:41Z

Looks OK for master/2.0, and even 1.6 as it's already on 1.1.2.1, and changes from 1.1.2.1 to 1.1.2.4 look like strict fixes/improvements. https://github.com/xerial/snappy-java/blob/master/Milestone.md

a-roberts · 2016-09-05T09:10:19Z

Yep, builds in progress on our farm against the latest 1.6 code using 1.1.2.6 so will improve the PR should there be no problems

jaceklaskowski · 2016-09-05T10:27:31Z

Would be nice to have a test in Spark to see if the changes to snappy made any sense to Spark (not that I'm against -- just as a safety measure that it did improve things if possible).

a-roberts · 2016-09-05T10:44:27Z

Good point, xerial/snappy-java@60cc0c2 shows the changes that were made, so we would have a checkMagicHeader equivalent added to the existing Spark tests, I'll look into it

SparkQA · 2016-09-05T11:05:46Z

Test build #64932 has finished for PR 14958 at commit 3795997.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-09-06T13:35:06Z

I don't know that the bug that was fixed has manifested yet in Spark; certainly not in the unit tests. It seems like something that could affect some execution out in the wild. It seems reasonable to take the fix as I don't see a downside to it. I don't know if we need to reproduce this same bug; snappy tests have done that.

jaceklaskowski · 2016-09-06T15:05:35Z

Fair enough. It was just an idea that would make the upgrade even more super-needed. The issues that snappy has fixed could manifest in various ways in Spark and while doing so could uncover others. Yes, again, it could be a project on its own. Sorry for the noise.

markhamstra · 2016-09-06T15:24:49Z

@srowen The bug does manifest in Spark. I've been running this upgrade in production for a few weeks with no issues. This should be merged IMO.

a-roberts · 2016-09-06T15:25:29Z

I'm almost done with adding in the test case anyway, involved a quick rewrite to use Scala code, had fun rewriting the for loop for the test into a while, I think it would manifest itself if your very first byte happened to be -126 and you're using Snappy.

At the very least I brushed up on my Scala skills and I think it's good practice to add a test with an implementation change so I'll update the PR once it's done. If Snappy were to change (e.g. 1.1.3 were to be released but the magic header logic changes) we'd at least be another project that picks up the problem.

srowen · 2016-09-06T15:37:46Z

Yeah I'm cool to merge this now, but if you have a test, OK. Is it testing Spark or just testing Snappy though?

Probably needs TLC with the while loop, this is based on https://github.com/xerial/snappy-java/blob/60cc0c2e1d1a76ae2981d0572a5164fcfdfba5f1/src/test/java/org/xerial/snappy/SnappyInputStreamTest.java but outside of the snappy package so we can't use MAGIC_HEADER[0].

a-roberts · 2016-09-06T15:39:31Z

Straight up rewrite so it's just testing that the Snappy version we include happens to have the magic header handling in, kept it simple

a-roberts · 2016-09-06T15:40:34Z

core/src/test/scala/org/apache/spark/io/CompressionCodecSuite.scala

+      var buf = new Array[Byte](4096)
+
+      var readBytes = 0
+      while (readBytes != -1) {


This loop makes me weep, it's an attempt @ rewriting the following but in Scala

for (int readBytes = 0; (readBytes = input.read(buf)) != -1; ) {

SparkQA · 2016-09-06T15:44:29Z

Test build #65000 has finished for PR 14958 at commit accc88f.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-09-06T15:46:02Z

core/src/test/scala/org/apache/spark/io/CompressionCodecSuite.scala

+  test("SPARK 17378: snappy-java should handle magic header when reading stream") {
+    val b = new ByteArrayOutputStream()
+    // Write uncompressed length beginning with -126 (the same with magicheader[0])
+    b.write(-126) // Can't access magic header[0] as it isn't public, so access this way


Hm, this seems to tie this test to this internal detail of Snappy though. Spark itself doesn't really need to assert this detail in a test.

I feel like this test is just testing snappy, which snappy can test. I could see testing a case at the level of Spark that triggers this bug and verifies it's fixed.

Yeah I agree, so how about we just revert the test case commit here and merge the 1.1.2.6 change itself as folks want it, and then in a later PR add an extra test for robustness if we want to.

Was a duplicate of the Snappy test instead of exercising the problem from Spark, something to think about later

SparkQA · 2016-09-06T18:25:44Z

Test build #65001 has finished for PR 14958 at commit 022cad7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

## What changes were proposed in this pull request? Upgrades the Snappy version to 1.1.2.6 from 1.1.2.4, release notes: https://github.com/xerial/snappy-java/blob/master/Milestone.md mention "Fix a bug in SnappyInputStream when reading compressed data that happened to have the same first byte with the stream magic header (#142)" ## How was this patch tested? Existing unit tests using the latest IBM Java 8 on Intel, Power and Z architectures (little and big-endian) Author: Adam Roberts <[email protected]> Closes #14958 from a-roberts/master. (cherry picked from commit 6c08dbf) Signed-off-by: Sean Owen <[email protected]>

Upgrades the Snappy version to 1.1.2.6 from 1.1.2.4, release notes: https://github.com/xerial/snappy-java/blob/master/Milestone.md mention "Fix a bug in SnappyInputStream when reading compressed data that happened to have the same first byte with the stream magic header (#142)" Existing unit tests using the latest IBM Java 8 on Intel, Power and Z architectures (little and big-endian) Author: Adam Roberts <[email protected]> Closes #14958 from a-roberts/master. (cherry picked from commit 6c08dbf) Signed-off-by: Sean Owen <[email protected]>

srowen · 2016-09-06T21:16:33Z

Merged to master/2.0/1.6

Upgrades the Snappy version to 1.1.2.6 from 1.1.2.4, release notes: https://github.com/xerial/snappy-java/blob/master/Milestone.md mention "Fix a bug in SnappyInputStream when reading compressed data that happened to have the same first byte with the stream magic header (apache#142)" Existing unit tests using the latest IBM Java 8 on Intel, Power and Z architectures (little and big-endian) Author: Adam Roberts <[email protected]> Closes apache#14958 from a-roberts/master. (cherry picked from commit 6c08dbf) Signed-off-by: Sean Owen <[email protected]> (cherry picked from commit af8e097)

yhuai · 2016-09-07T01:55:47Z

Using `mvn` from path: /home/jenkins/workspace/spark-branch-1.6-lint/build/apache-maven-3.3.9/bin/mvn
Spark's published dependencies DO NOT MATCH the manifest file (dev/spark-deps).
To update the manifest file, run './dev/test-dependencies.sh --replace-manifest'.
diff --git a/dev/deps/spark-deps-hadoop-1 b/dev/pr-deps/spark-deps-hadoop-1
index dd5a6dc..a97b10c 100644
--- a/dev/deps/spark-deps-hadoop-1
+++ b/dev/pr-deps/spark-deps-hadoop-1
@@ -143,7 +143,7 @@ servlet-api-2.5.jar
 slf4j-api-1.7.10.jar
 slf4j-log4j12-1.7.10.jar
 snappy-0.2.jar
-snappy-java-1.1.2.1.jar
+snappy-java-1.1.2.6.jar
 spire-macros_2.10-0.7.4.jar
 spire_2.10-0.7.4.jar
 stax-api-1.0.1.jar
Using `mvn` from path: /home/jenkins/workspace/spark-branch-1.6-lint/build/apache-maven-3.3.9/bin/mvn
Build step 'Execute shell' marked build as failure
Finished: FAILURE

Can you take a look at 1.6 build (https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-1.6-lint/262/console)? Seems the 1.6 build is broken by this pr.

srowen · 2016-09-07T07:57:49Z

@yhuai ack OK I'll fix that. I must have managed to miss something in the deps file when I resolved conflicts on the cherry pick. It can't be complex to touch up though. Edit: ahh, right. We have a Hadoop 1 deps file in branch-1.6. Fixing it in #14992

a-roberts added 6 commits September 5, 2016 09:48

Update Snappy to 1.1.2.6

4b7ec9a

Pom change

Update spark-deps-hadoop-2.2

440b1df

Update spark-deps-hadoop-2.3

140f70a

Update spark-deps-hadoop-2.4

cda5016

Update spark-deps-hadoop-2.6

36253ba

Update spark-deps-hadoop-2.7

3795997

a-roberts reviewed Sep 6, 2016
View reviewed changes

srowen reviewed Sep 6, 2016
View reviewed changes

Revert test addition in CompressionCodecSuite

022cad7

Was a duplicate of the Snappy test instead of exercising the problem from Spark, something to think about later

asfgit closed this in 6c08dbf Sep 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-17378] [BUILD] Upgrade snappy-java to 1.1.2.6 #14958

[SPARK-17378] [BUILD] Upgrade snappy-java to 1.1.2.6 #14958

a-roberts commented Sep 5, 2016

srowen commented Sep 5, 2016

a-roberts commented Sep 5, 2016

jaceklaskowski commented Sep 5, 2016

a-roberts commented Sep 5, 2016

SparkQA commented Sep 5, 2016

srowen commented Sep 6, 2016

jaceklaskowski commented Sep 6, 2016

markhamstra commented Sep 6, 2016

a-roberts commented Sep 6, 2016

srowen commented Sep 6, 2016

a-roberts commented Sep 6, 2016

a-roberts Sep 6, 2016

SparkQA commented Sep 6, 2016

srowen Sep 6, 2016

a-roberts Sep 6, 2016

SparkQA commented Sep 6, 2016

srowen commented Sep 6, 2016

yhuai commented Sep 7, 2016

srowen commented Sep 7, 2016 •

edited

Loading

[SPARK-17378] [BUILD] Upgrade snappy-java to 1.1.2.6 #14958

[SPARK-17378] [BUILD] Upgrade snappy-java to 1.1.2.6 #14958

Conversation

a-roberts commented Sep 5, 2016

What changes were proposed in this pull request?

How was this patch tested?

srowen commented Sep 5, 2016

a-roberts commented Sep 5, 2016

jaceklaskowski commented Sep 5, 2016

a-roberts commented Sep 5, 2016

SparkQA commented Sep 5, 2016

srowen commented Sep 6, 2016

jaceklaskowski commented Sep 6, 2016

markhamstra commented Sep 6, 2016

a-roberts commented Sep 6, 2016

srowen commented Sep 6, 2016

a-roberts commented Sep 6, 2016

a-roberts Sep 6, 2016

Choose a reason for hiding this comment

SparkQA commented Sep 6, 2016

srowen Sep 6, 2016

Choose a reason for hiding this comment

a-roberts Sep 6, 2016

Choose a reason for hiding this comment

SparkQA commented Sep 6, 2016

srowen commented Sep 6, 2016

yhuai commented Sep 7, 2016

srowen commented Sep 7, 2016 • edited Loading

srowen commented Sep 7, 2016 •

edited

Loading