Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Hadoop Into Spark #286

Closed
wants to merge 1 commit into from
Closed

Merge Hadoop Into Spark #286

wants to merge 1 commit into from

Conversation

pwendell
Copy link
Contributor

@pwendell pwendell commented Apr 1, 2014

This patch merges the Hadoop 0.20.2 source code into the Spark project. I've thought about this a bunch and this will provide us with several benefits:

More source code

Let's be honest, to be taken seriously as a project Spark needs to have way more lines of code. Spark is currently 70,000 lines of Scala code - this patch adds 452,000 lines of XML alone (!) This will make our github stats look great!

Seamless builds

Sometimes users stumble trying to build Spark against Hadoop. Not anymore!! With Hadoop inside of Spark this won't a problem at all. I mean, there's basically only one Hadoop version, right? So this should work for pretty much everyone. Wait, hold on, is hadoop-0.20.2 the same as hadoop-2.2.0? I'm assuming it's the same because they both have the same number of "2"s.

Your favorite old configs

This patch will give users access to some of their favorite old configs from Hadoop. Did you just figure out what dfs.namenode.path.based.cache.block.map.allocation.percent was?! Now you can use it in Spark!! Pining for your old friend mapreduce.map.skip.proc.count.autoincr... fuggedabodit - we got ya!

I plan to contribute tests and docs in a subsequent patch. Please merge this ASAP and include in Spark 0.9.1.

NB: This diff is too large for github to render. Users will have to download and play with this on their own.

This patch merges the Hadoop 0.20.2 source code into the Spark project. I've thought about this a bunch and this will provide us with several benefits:

Let's be honest, to be taken seriously as a project Spark needs to have _way more_ lines of code. Spark is currently 70,000 lines of Scala code - this patch adds 452,000 lines of XML alone (!) This will make our github stats look great!

Sometimes users stumble trying to build Spark against Hadoop. Not anymore!! With Hadoop inside of Spark this won't a problem at all. I mean, there's basically only one Hadoop version, right? So this should work for pretty much everyone. Wait, hold on, is hadoop-0.20.2 the same as hadoop-2.2.0? I'm assuming it's the same because they both have the same number of "2"s.

This patch will give users access to some of their favorite old configs from Hadoop. Did you just figure out what `dfs.namenode.path.based.cache.block.map.allocation.percent` was?! Now you can use it in Spark!! Pining for your old friend `mapreduce.map.skip.proc.count.autoincr`... fuggedabodit - we got ya!

I plan to contribute tests and docs in a subsequent patch. Please merge this ASAP and include in Spark 0.9.1.
@rxin
Copy link
Contributor

rxin commented Apr 1, 2014

+1 !!! I have been asking for this multiple times on the mailing list and finally see the light!!!

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13636/

@aarondav
Copy link
Contributor

aarondav commented Apr 1, 2014

But will this work on YARN?

@mengxr
Copy link
Contributor

mengxr commented Apr 1, 2014

The best way to have more lines than some project is to merge it!

@shivaram
Copy link
Contributor

shivaram commented Apr 1, 2014

I love hadoop-0.20.2 -- It is the best Hadoop I have ever used. Thanks @pwendell for pulling this in.

@mridulm
Copy link
Contributor

mridulm commented Apr 1, 2014

LGTM, merged !

On Wed, Apr 2, 2014 at 12:20 AM, Shivaram Venkataraman <
[email protected]> wrote:

I love hadoop-0.20.2 -- It is the best Hadoop I have ever used. Thanks
@pwendell https://github.com/pwendell for pulling this in.

Reply to this email directly or view it on GitHubhttps://github.com//pull/286#issuecomment-39243472
.

@andrewor14
Copy link
Contributor

LGTMT

@CodingCat
Copy link
Contributor

Great job!

@sryza
Copy link
Contributor

sryza commented Apr 1, 2014

I'm OK with holding off on this for a separate JIRA, but for completeness I'd like to propose merging in CDH code as well. I believe that CDH4.2.2 would be the best choice of version, as it maintains 2-compatibility with 0.20.2 and 2.2.0.

@aarondav
Copy link
Contributor

aarondav commented Apr 1, 2014

Guys, I just realized that this includes a really cool submodule called "Map Reduce" which seems to be a generalized form of Spark! It also has around 3x the amount of code, and that's in Java (which is somewhat more concise than Scala), so I estimate it's roughly 5x as awesome as Spark. Are there plans for deprecating the Spark API in favor of "Map Reduce"?

If we deprecate Spark in 0.9.1 and replace it with Map Reduce in 1.0.0, that should give users enough time to migrate to the new API (which is much simpler -- only two functions!).

@mattf
Copy link
Contributor

mattf commented Apr 1, 2014

+1, lgtm, but only today!

@mateiz
Copy link
Contributor

mateiz commented Apr 1, 2014

Wait, so did it pass or fail Jenkins tests? Jenkins isn't saying what happened, only that the build finished.

Anyway I'd be okay with this as long as you make it meet our code style guidelines.

@pwendell pwendell closed this Apr 2, 2014
@CrazyJvm
Copy link
Contributor

CrazyJvm commented Apr 2, 2014

amazing! I've been looking forward it for a long time!

@DavyLin
Copy link

DavyLin commented Apr 2, 2014

I do not know, ah, but I'm looking forward to it

@liancheng
Copy link
Contributor

This is a good start! Why not merge all transitive dependencies into Spark altogether? In this way we can save lots of efforts on SBT/Maven, and avoid time consuming Ivy dependency resolution.

You know, ever since those .orbit things were added, sbt update costs nearly 40 minutes to resolve all the dependencies in mainland China... I have to turn it off by skip in update := true and offline := true.

OK, the 2nd paragraph is not part of the joke :-)

@yinxusen
Copy link
Contributor

yinxusen commented Apr 2, 2014

Yep, I find that each time I do sbt clean gen-idea or sbt update or even sbt testOnly xxx, I can do the cooking, take a shower, and have a rest.

@pelick
Copy link

pelick commented Apr 2, 2014

Is it true about this => “ Wait, hold on, is hadoop-0.20.2 the same as hadoop-2.2.0? I'm assuming it's the same because they both have the same number of "2"s. ”

@win2cs
Copy link

win2cs commented Apr 2, 2014

"But will this work on YARN?", I've the same question.
And does it still support specifying other Hadoop distribution like before?

@zsxwing
Copy link
Member

zsxwing commented Apr 2, 2014

You know, ever since those .orbit things were added, sbt update costs nearly 40 minutes to resolve all the dependencies in mainland China... I have to turn it off by skip in update := true and offline := true.

+1. Really terrible nightmare. If the version of some jar is 6.4, it usually can not be downloaded in mainland China without a proxy :(

asfgit pushed a commit that referenced this pull request Apr 8, 2014
JIRA issue: [SPARK-1402](https://issues.apache.org/jira/browse/SPARK-1402)

This PR provides 3 more compression schemes for Spark SQL in-memory columnar storage:

* `BooleanBitSet`
* `IntDelta`
* `LongDelta`

Now there are 6 compression schemes in total, including the no-op `PassThrough` scheme.

Also fixed a bug in PR #286: not all compression schemes are added as available schemes when accessing an in-memory column, and when a column is compressed with an unrecognised scheme, `ColumnAccessor` throws exception.

Author: Cheng Lian <[email protected]>

Closes #330 from liancheng/moreCompressionSchemes and squashes the following commits:

1d037b8 [Cheng Lian] Fixed SPARK-1436: in-memory column byte buffer must be able to be accessed multiple times
d7c0e8f [Cheng Lian] Added test suite for IntegralDelta (IntDelta & LongDelta)
3c1ad7a [Cheng Lian] Added test suite for BooleanBitSet, refactored other test suites
44fe4b2 [Cheng Lian] Refactored CompressionScheme, added 3 more compression schemes.
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
JIRA issue: [SPARK-1402](https://issues.apache.org/jira/browse/SPARK-1402)

This PR provides 3 more compression schemes for Spark SQL in-memory columnar storage:

* `BooleanBitSet`
* `IntDelta`
* `LongDelta`

Now there are 6 compression schemes in total, including the no-op `PassThrough` scheme.

Also fixed a bug in PR apache#286: not all compression schemes are added as available schemes when accessing an in-memory column, and when a column is compressed with an unrecognised scheme, `ColumnAccessor` throws exception.

Author: Cheng Lian <[email protected]>

Closes apache#330 from liancheng/moreCompressionSchemes and squashes the following commits:

1d037b8 [Cheng Lian] Fixed SPARK-1436: in-memory column byte buffer must be able to be accessed multiple times
d7c0e8f [Cheng Lian] Added test suite for IntegralDelta (IntDelta & LongDelta)
3c1ad7a [Cheng Lian] Added test suite for BooleanBitSet, refactored other test suites
44fe4b2 [Cheng Lian] Refactored CompressionScheme, added 3 more compression schemes.
lins05 pushed a commit to lins05/spark that referenced this pull request May 30, 2017
* Replace submission v1 with submission v2.

* Address documentation changes.

* Fix documentation
erikerlandson pushed a commit to erikerlandson/spark that referenced this pull request Jul 28, 2017
* Replace submission v1 with submission v2.

* Address documentation changes.

* Fix documentation
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
…public-clouds-job

Refactor docker machine job to run specific tests set and add cleanup machanism
arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020
dongjoon-hyun pushed a commit that referenced this pull request Jul 16, 2021
### What changes were proposed in this pull request?

This PR upgrades `GenJavadoc` plugin from `0.17` to `0.18`.

### Why are the changes needed?

`0.18` includes a bug fix for `Scala 2.13`.
```
This release fixes a bug (#286) with Scala 2.13.6 in relation with deprecated annotations in Scala sources leading to a NoSuchElementException in some cases.
```
https://github.com/lightbend/genjavadoc/releases/tag/v0.18

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Built the doc for Scala 2.13.
```
build/sbt -Phive -Phive-thriftserver -Pyarn -Pmesos -Pkubernetes -Phadoop-cloud -Pspark-ganglia-lgpl -Pkinesis-asl -Pdocker-integration-tests -Pkubernetes-integration-tests -Pscala-2.13 unidoc
```

Closes #33383 from sarutak/upgrade-genjavadoc-0.18.

Authored-by: Kousuke Saruta <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Jul 16, 2021
### What changes were proposed in this pull request?

This PR upgrades `GenJavadoc` plugin from `0.17` to `0.18`.

### Why are the changes needed?

`0.18` includes a bug fix for `Scala 2.13`.
```
This release fixes a bug (#286) with Scala 2.13.6 in relation with deprecated annotations in Scala sources leading to a NoSuchElementException in some cases.
```
https://github.com/lightbend/genjavadoc/releases/tag/v0.18

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Built the doc for Scala 2.13.
```
build/sbt -Phive -Phive-thriftserver -Pyarn -Pmesos -Pkubernetes -Phadoop-cloud -Pspark-ganglia-lgpl -Pkinesis-asl -Pdocker-integration-tests -Pkubernetes-integration-tests -Pscala-2.13 unidoc
```

Closes #33383 from sarutak/upgrade-genjavadoc-0.18.

Authored-by: Kousuke Saruta <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit ad744fb)
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.