Skip to content
This repository has been archived by the owner on Jan 9, 2020. It is now read-only.

[WIP] Secure HDFS Support #373

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
535 commits
Select commit Hold shift + click to select a range
8ca6a82
[SPARK-18572][SQL] Add a method `listPartitionNames` to `ExternalCata…
Dec 6, 2016
655297b
[SPARK-18721][SS] Fix ForeachSink with watermark + append
zsxwing Dec 6, 2016
e362d99
[SPARK-18634][SQL][TRIVIAL] Touch-up Generate
hvanhovell Dec 6, 2016
ace4079
[SPARK-18714][SQL] Add a simple time function to SparkSession
rxin Dec 6, 2016
d20e0d6
[SPARK-18671][SS][TEST] Added tests to ensure stability of that all S…
tdas Dec 6, 2016
65f5331
[SPARK-18652][PYTHON] Include the example data and third-party licens…
lins05 Dec 6, 2016
9b5bc2a
[SPARK-18734][SS] Represent timestamp in StreamingQueryProgress as fo…
tdas Dec 7, 2016
3750c6e
[SPARK-18671][SS][TEST-MAVEN] Follow up PR to fix test for Maven
tdas Dec 7, 2016
340e9ae
[SPARK-18686][SPARKR][ML] Several cleanup and improvements for spark.…
yanboliang Dec 7, 2016
99c293e
[SPARK-18701][ML] Fix Poisson GLM failure due to wrong initialization
actuaryzhang Dec 7, 2016
51754d6
[SPARK-18678][ML] Skewed reservoir sampling in SamplingUtils
srowen Dec 7, 2016
4432a2a
[SPARK-18208][SHUFFLE] Executor OOM due to a growing LongArray in Byt…
Dec 7, 2016
5dbcd4f
[SPARK-17760][SQL] AnalysisException with dataframe pivot when groupB…
aray Dec 7, 2016
acb6ac5
[SPARK-18764][CORE] Add a warning log when skipping a corrupted file
zsxwing Dec 7, 2016
76e1f16
[SPARK-18762][WEBUI] Web UI should be http:4040 instead of https:4040
sarutak Dec 7, 2016
e9b3afa
[SPARK-18588][TESTS] Fix flaky test: KafkaSourceStressForDontFailOnDa…
zsxwing Dec 7, 2016
1c64197
[SPARK-18754][SS] Rename recentProgresses to recentProgress
marmbrus Dec 7, 2016
839c2eb
[SPARK-18633][ML][EXAMPLE] Add multiclass logistic regression summary…
wangmiao1981 Dec 8, 2016
617ce3b
[SPARK-18758][SS] StreamingQueryListener events from a StreamingQuery…
tdas Dec 8, 2016
ab865cf
[SPARK-18705][ML][DOC] Update user guide to reflect one pass solver f…
sethah Dec 8, 2016
1c3f1da
[SPARK-18326][SPARKR][ML] Review SparkR ML wrappers API for 2.1
yanboliang Dec 8, 2016
0807174
Preparing Spark release v2.1.0-rc2
pwendell Dec 8, 2016
48aa677
Preparing development version 2.1.1-SNAPSHOT
pwendell Dec 8, 2016
9095c15
[SPARK-18325][SPARKR][ML] SparkR ML wrappers example code and user guide
yanboliang Dec 8, 2016
726217e
[SPARK-18667][PYSPARK][SQL] Change the way to group row in BatchEvalP…
viirya Dec 8, 2016
e0173f1
[SPARK-16589] [PYTHON] Chained cartesian produces incorrect number of…
aray Dec 8, 2016
d69df90
[SPARK-18590][SPARKR] build R source package when making distribution
felixcheung Dec 8, 2016
a035644
[SPARK-18751][CORE] Fix deadlock when SparkContext.stop is called in …
zsxwing Dec 8, 2016
9483242
[SPARK-18760][SQL] Consistent format specification for FileFormats
rxin Dec 8, 2016
e43209f
[SPARK-18590][SPARKR] Change the R source build to Hadoop 2.6
shivaram Dec 8, 2016
fcd22e5
[SPARK-18776][SS] Make Offset for FileStreamSource corrected formatte…
tdas Dec 9, 2016
1cafc76
[SPARK-18774][CORE][SQL] Ignore non-existing files when ignoreCorrupt…
zsxwing Dec 9, 2016
ef5646b
[SPARKR][PYSPARK] Fix R source package name to match Spark version. R…
shivaram Dec 9, 2016
4ceed95
[SPARK-18349][SPARKR] Update R API documentation on ml model summary
wangmiao1981 Dec 9, 2016
e8f351f
Copy the SparkR source package with LFTP
shivaram Dec 9, 2016
2c88e1d
Copy pyspark and SparkR packages to latest release dir too
felixcheung Dec 9, 2016
72bf519
[SPARK-18637][SQL] Stateful UDF should be considered as nondeterministic
Dec 9, 2016
b226f10
[MINOR][CORE][SQL][DOCS] Typo fixes
jaceklaskowski Dec 9, 2016
0c6415a
[SPARK-17822][R] Make JVMObjectTracker a member variable of RBackend
mengxr Dec 9, 2016
eb2d9bf
[MINOR][SPARKR] Fix SparkR regex in copy command
shivaram Dec 9, 2016
562507e
[SPARK-18745][SQL] Fix signed integer overflow due to toInt cast
kiszk Dec 9, 2016
e45345d
[SPARK-18812][MLLIB] explain "Spark ML"
mengxr Dec 10, 2016
8bf56cc
[SPARK-18807][SPARKR] Should suppress output print for calls to JVM m…
felixcheung Dec 10, 2016
b020ce4
[SPARK-18811] StreamSource resolution should happen in stream executi…
brkyvz Dec 10, 2016
2b36f49
[SPARK-17460][SQL] Make sure sizeInBytes in Statistics will not overflow
huaxingao Dec 10, 2016
83822df
[MINOR][DOCS] Remove Apache Spark Wiki address
dongjoon-hyun Dec 10, 2016
5151daf
[SPARK-3359][DOCS] Fix greater-than symbols in Javadoc to allow build…
michalsenkyr Dec 10, 2016
de21ca4
[SPARK-18815][SQL] Fix NPE when collecting column stats for string/bi…
wzhfy Dec 11, 2016
d4c03f8
[SQL][MINOR] simplify a test to fix the maven tests
cloud-fan Dec 11, 2016
d5f1416
[SPARK-18628][ML] Update Scala param and Python param to have quotes
krishnakalyan3 Dec 11, 2016
63693c1
[SPARK-18790][SS] Keep a general offset history of stream batches
tcondie Dec 12, 2016
3501160
[DOCS][MINOR] Clarify Where AccumulatorV2s are Displayed
Dec 12, 2016
523071f
[SPARK-18681][SQL] Fix filtering to compatible with partition keys of…
wangyum Dec 12, 2016
1aeb7f4
[SPARK-18810][SPARKR] SparkR install.spark does not work for RCs, sna…
felixcheung Dec 12, 2016
9dc5fa5
[SPARK-18796][SS] StreamingQueryManager should not block when startin…
zsxwing Dec 13, 2016
9f0e3be
[SPARK-18797][SPARKR] Update spark.logit in sparkr-vignettes
wangmiao1981 Dec 13, 2016
207107b
[SPARK-18835][SQL] Don't expose Guava types in the JavaTypeInference …
Dec 13, 2016
d5c4a5d
[SPARK-18840][YARN] Avoid throw exception when getting token renewal …
jerryshao Dec 13, 2016
292a37f
[SPARK-18816][WEB UI] Executors Logs column only ran visibility check…
ajbozarth Dec 13, 2016
f672bfd
[SPARK-18843][CORE] Fix timeout in awaitResultInForkJoinSafely (branc…
zsxwing Dec 13, 2016
25b9758
[SPARK-18834][SS] Expose event time stats through StreamingQueryProgress
tdas Dec 13, 2016
5693ac8
[SPARK-18793][SPARK-18794][R] add spark.randomForest/spark.gbt to vig…
mengxr Dec 14, 2016
019d1fa
[SPARK-18588][TESTS] Ignore KafkaSourceStressForDontFailOnDataLossSuite
zsxwing Dec 14, 2016
8ef0059
[MINOR][SPARKR] fix kstest example error and add unit test
wangmiao1981 Dec 14, 2016
f999312
[SPARK-18814][SQL] CheckAnalysis rejects TPCDS query 32
nsyca Dec 14, 2016
16d4bd4
[SPARK-18730] Post Jenkins test report page instead of the full conso…
liancheng Dec 14, 2016
af12a21
[SPARK-18753][SQL] Keep pushed-down null literal as a filter in Spark…
HyukjinKwon Dec 14, 2016
e8866f9
[SPARK-18853][SQL] Project (UnaryNode) is way too aggressive in estim…
rxin Dec 14, 2016
c4de90f
[SPARK-18852][SS] StreamingQuery.lastProgress should be null when rec…
zsxwing Dec 14, 2016
d0d9c57
[SPARK-18795][ML][SPARKR][DOC] Added KSTest section to SparkR vignettes
jkbradley Dec 14, 2016
280c35a
[SPARK-18854][SQL] numberedTreeString and apply(i) inconsistent for s…
rxin Dec 15, 2016
0d94201
[SPARK-18865][SPARKR] SparkR vignettes MLP and LDA updates
wangmiao1981 Dec 15, 2016
cb2c842
[SPARK-18856][SQL] non-empty partitioned table should not report zero…
cloud-fan Dec 15, 2016
b14fc39
[SPARK-18869][SQL] Add TreeNode.p that returns BaseType
rxin Dec 15, 2016
d399a29
[SPARK-18875][SPARKR][DOCS] Fix R API doc generation by adding `DESCR…
dongjoon-hyun Dec 15, 2016
2a8de2e
[SPARK-18849][ML][SPARKR][DOC] vignettes final check update
felixcheung Dec 15, 2016
e430915
[SPARK-18870] Disallowed Distinct Aggregations on Streaming Datasets
tdas Dec 15, 2016
900ce55
[SPARK-18826][SS] Add 'latestFirst' option to FileStreamSource
zsxwing Dec 15, 2016
b6a81f4
[SPARK-18888] partitionBy in DataStreamWriter in Python throws _to_se…
brkyvz Dec 15, 2016
ef2ccf9
Preparing Spark release v2.1.0-rc3
pwendell Dec 15, 2016
a7364a8
Preparing development version 2.1.1-SNAPSHOT
pwendell Dec 15, 2016
08e4272
[SPARK-18868][FLAKY-TEST] Deflake StreamingQueryListenerSuite: single…
brkyvz Dec 15, 2016
ae853e8
[MINOR] Only rename SparkR tar.gz if names mismatch
shivaram Dec 16, 2016
ec31726
Preparing Spark release v2.1.0-rc4
pwendell Dec 16, 2016
62a6577
Preparing development version 2.1.1-SNAPSHOT
pwendell Dec 16, 2016
b23220f
[MINOR] Handle fact that mv is different on linux, mac
shivaram Dec 16, 2016
cd0a083
Preparing Spark release v2.1.0-rc5
pwendell Dec 16, 2016
2ffed59
[SPARK-18278] Minimal support for submitting to Kubernetes.
mccheah Dec 6, 2016
00e545f
Fix style
mccheah Dec 6, 2016
cdbd9bb
Make naming more consistent
mccheah Dec 7, 2016
8f69fc0
Fix building assembly with Kubernetes.
mccheah Dec 9, 2016
75c6086
Service account support, use constants from fabric8 library.
mccheah Dec 10, 2016
93b75ce
Some small changes
mccheah Jan 7, 2017
e7397e8
Use k8s:// formatted URL instead of separate setting.
mccheah Jan 9, 2017
ed65428
Reindent comment to conforn to JavaDoc style
foxish Jan 9, 2017
f9ddb63
Move kubernetes under resource-managers folder.
mccheah Jan 9, 2017
178abc1
Use tar and gzip to compress+archive shipped jars (#2)
mccheah Jan 11, 2017
e2787e8
Use alpine and java 8 for docker images. (#10)
mccheah Jan 12, 2017
acceb72
Copy the Dockerfiles from docker-minimal-bundle into the distribution…
mccheah Jan 12, 2017
24f4bf0
inherit IO (#13)
foxish Jan 12, 2017
adcc906
Error messages when the driver container fails to start. (#11)
mccheah Jan 13, 2017
0b81dbf
Fix linter error to make CI happy (#18)
foxish Jan 13, 2017
e70f427
Documentation for the current state of the world (#16)
mccheah Jan 13, 2017
b25bc8b
Development workflow documentation for the current state of the world…
mccheah Jan 13, 2017
761b317
Added service name as prefix to executor pods (#14)
foxish Jan 13, 2017
8739b41
Add kubernetes profile to travis CI yml file (#21)
kimoonkim Jan 14, 2017
928e00e
Improved the example commands in running-on-k8s document. (#25)
lins05 Jan 17, 2017
3e3c4d4
Fix spacing for command highlighting (#31)
foxish Jan 18, 2017
36c4e94
Support custom labels on the driver pod. (#27)
mccheah Jan 19, 2017
b6c57c7
Make pod name unique using the submission timestamp (#32)
foxish Jan 19, 2017
3fd9c62
A number of small tweaks to the MVP. (#23)
mccheah Jan 24, 2017
81875a6
Correct hadoop profile: hadoop2.7 -> hadoop-2.7 (#41)
ash211 Jan 25, 2017
2a26ebd
Support setting the driver pod launching timeout. (#36)
lins05 Jan 25, 2017
b98c852
Sanitize kubernetesAppId for use in secret, service, and pod names (#45)
ash211 Jan 25, 2017
27f3005
Support spark.driver.extraJavaOptions (#48)
kimoonkim Jan 26, 2017
48f5884
Use "extraScalaTestArgs" to pass extra options to scalatest. (#52)
lins05 Jan 26, 2017
81bd355
Use OpenJDK8's official Alpine image. (#51)
mccheah Jan 26, 2017
86bd589
Remove unused driver extra classpath upload code (#54)
mccheah Jan 26, 2017
e6f35d2
Fix k8s integration tests (#44)
lins05 Jan 27, 2017
6cceb59
Added GC to components (#56)
foxish Jan 27, 2017
3b5901a
Create README to better describe project purpose (#50)
ash211 Jan 28, 2017
2e992be
Access the Driver Launcher Server over NodePort for app launch + subm…
mccheah Jan 30, 2017
b2e6877
Extract constants and config into separate file. Launch => Submit. (#65)
mccheah Jan 31, 2017
6ee3be5
Retry the submit-application request to multiple nodes (#69)
mccheah Feb 2, 2017
d0f95db
Allow adding arbitrary files (#71)
mccheah Feb 2, 2017
de9a82e
Fix NPE around unschedulable pod specs (#79)
ash211 Feb 2, 2017
fae76a0
Introduce blocking submit to kubernetes by default (#53)
ash211 Feb 3, 2017
4bc7c52
Do not wait for pod finishing in integration tests. (#84)
lins05 Feb 3, 2017
52a7ab2
Check for user jars/files existence before creating the driver pod. (…
lins05 Feb 8, 2017
487d1e1
Use readiness probe instead of client-side ping. (#75)
mccheah Feb 9, 2017
bdfc4e1
Note integration tests require Java 8 (#99)
ash211 Feb 10, 2017
fe8b45c
Bumping up kubernetes-client version to fix GKE and local proxy (#105)
foxish Feb 10, 2017
7a4075f
Truncate k8s hostnames to be no longer than 63 characters (#102)
ash211 Feb 11, 2017
3d80fff
Fixed loading the executors page through the kubectl proxy. (#95)
lins05 Feb 13, 2017
a34a114
Filter nodes to only try and send files to external IPs (#106)
foxish Feb 13, 2017
ac4dd91
Parse results of minikube status more rigorously (#97)
ash211 Feb 13, 2017
2112c4a
Adding legacyHostIP to the list of IPs we look at (#114)
foxish Feb 14, 2017
043cdd9
Add -DskipTests to dev docs (#115)
ash211 Feb 15, 2017
0e6df11
Shutdown the thread scheduler in LoggingPodStatusWatcher on receiving…
varunkatta Feb 16, 2017
a800e20
Trigger scalatest plugin in the integration-test phase (#93)
kimoonkim Feb 16, 2017
2773b77
Fix issue with DNS resolution (#118)
foxish Feb 16, 2017
6a999ca
Change the API contract for uploading local files (#107)
mccheah Feb 16, 2017
cad5dd3
Optionally expose the driver UI port as NodePort (#131)
kimoonkim Feb 22, 2017
68a83a2
Set the REST service's exit code to the exit code of its driver subpr…
ash211 Feb 23, 2017
1ab6dbc
Pass the actual iterable from the option to get files (#139)
mccheah Feb 23, 2017
bb5cb21
Use a separate class to track components that need to be cleaned up (…
mccheah Feb 23, 2017
04a555e
Enable unit tests in Travis CI build (#132)
kimoonkim Feb 23, 2017
d7f41c5
Change driver pod's restart policy from OnFailure to Never (#145)
ash211 Feb 23, 2017
b4b1bdd
Extract SSL configuration handling to a separate class (#123)
mccheah Feb 24, 2017
39c2cf2
Exclude known flaky tests (#156)
kimoonkim Feb 24, 2017
2303aad
Richer logging and better error handling in driver pod watch (#154)
foxish Feb 24, 2017
e7f78cb
Document blocking submit calls (#152)
ash211 Feb 25, 2017
fd24f23
Allow custom annotations on the driver pod. (#163)
mccheah Mar 2, 2017
7132f5d
Update client version & minikube version (#142)
foxish Mar 2, 2017
a51dcc8
Allow customizing external URI provision + External URI can be set vi…
mccheah Mar 3, 2017
a14dc1e
Remove okhttp from top-level pom (#166)
foxish Mar 3, 2017
015f18d
Allow setting memory on the driver submission server. (#161)
mccheah Mar 3, 2017
f414355
Add a section for prerequisites (#171)
foxish Mar 4, 2017
6cf635d
Add instructions to find master URL (#169)
foxish Mar 4, 2017
191dd51
Propagate exceptions (#172)
mccheah Mar 6, 2017
dc4e3d2
Logging for resource deletion (#170)
ash211 Mar 6, 2017
3636939
Fix pom versions (#178)
foxish Mar 14, 2017
2382ea6
Exclude flaky ExternalShuffleServiceSuite from Travis (#185)
kimoonkim Mar 15, 2017
b139b46
Fix lint-check failures and javadoc8 break (#187)
ash211 Mar 16, 2017
8c08189
Docs improvements (#176)
foxish Mar 8, 2017
8756494
Add Apache license to a few files (#175)
ash211 Mar 8, 2017
fece639
Adding clarification pre-alpha (#181)
foxish Mar 8, 2017
35724a3
Allow providing an OAuth token for authenticating against k8s (#180)
mccheah Mar 13, 2017
f9f5af4
Merge pull request #177 from apache-spark-on-k8s/prep-for-alpha-release
ash211 Mar 16, 2017
d5502ed
Allow the driver pod's credentials to be shipped from the submission …
ash211 Mar 17, 2017
078697f
Support using PEM files to configure SSL for driver submission (#173)
mccheah Mar 20, 2017
7039934
Update tags on docker images. (#196)
foxish Mar 21, 2017
3254246
Add additional instructions to use release tarball (#198)
foxish Mar 22, 2017
35a5e32
Support specify CPU cores for driver pod (#207)
hustcat Mar 30, 2017
0a13206
Register executors using pod IPs instead of pod host names (#215)
kimoonkim Apr 5, 2017
13f16d5
Upgrade bouncycastle, force bcprov version (#223)
mccheah Apr 10, 2017
c6a5c6e
Stop executors cleanly before deleting their pods (#231)
ash211 Apr 13, 2017
0b0fb6f
Upgrade Kubernetes client to 2.2.13. (#230)
mccheah Apr 14, 2017
1388e0a
Respect JVM http proxy settings when using Feign. (#228)
mccheah Apr 17, 2017
3f6e5ea
Staging server for receiving application dependencies. (#212)
mccheah Apr 21, 2017
e24c4af
Reorganize packages between v1 work and v2 work (#220)
mccheah Apr 21, 2017
4940eae
Support SSL on the file staging server (#221)
mccheah Apr 21, 2017
04afcf8
Driver submission with mounting dependencies from the staging server …
mccheah Apr 25, 2017
6b489c2
Enable testing against GCE clusters (#243)
foxish May 2, 2017
0e1cb40
Update running-on-kubernetes.md (#259)
erikerlandson May 2, 2017
ba151c0
Build with sbt and fix scalastyle checks. (#241)
lins05 May 3, 2017
4ac0de1
Updating images in doc (#219)
foxish May 3, 2017
8ccb305
Correct readme links (#266)
johscheuer May 5, 2017
0a8080a
edit readme with a working build example command (#254)
erikerlandson May 9, 2017
26f747e
Fix watcher conditional logic (#269)
erikerlandson May 10, 2017
546f09c
Dispatch tasks to right executors that have tasks' input HDFS data (#…
kimoonkim May 10, 2017
eb45ae5
Add parameter for driver pod name (#258)
hustcat May 16, 2017
e9da549
Dynamic allocation (#272)
foxish May 17, 2017
f005268
Download remotely-located resources on driver and executor startup vi…
mccheah May 17, 2017
e071ad9
Scalastyle fixes (#278)
ash211 May 17, 2017
6882a1b
Exit properly when the k8s cluster is not available. (#256)
lins05 May 18, 2017
9d6665c
Support driver pod kubernetes credentials mounting in V2 submission (…
mccheah May 18, 2017
88306b2
Allow client certificate PEM for resource staging server. (#257)
mccheah May 19, 2017
8f6f0a0
Differentiate between URI and SSL settings for in-cluster vs. submiss…
mccheah May 19, 2017
408c65f
Monitor pod status in submission v2. (#283)
mccheah May 22, 2017
8f3d965
Replace submission v1 with submission v2. (#286)
mccheah May 23, 2017
56414f9
Added files should be in the working directories. (#294)
mccheah May 23, 2017
fe03c7c
Add missing license (#296)
mccheah May 24, 2017
3881404
Remove some leftover code and fix a constant. (#297)
mccheah May 24, 2017
b84cb66
Adding restart policy fix for v2 (#303)
foxish May 25, 2017
dbf7a39
Add all dockerfiles to distributions. (#307)
mccheah May 26, 2017
2a2cfb6
Add proxy configuration to retrofit clients. (#301)
mccheah May 26, 2017
d31d81a
Fix an HDFS data locality bug in case cluster node names are short ho…
kimoonkim May 26, 2017
0702e18
Remove leading slash from Retrofit interface. (#308)
mccheah May 30, 2017
9be8f20
Use tini in Docker images (#320)
mccheah May 31, 2017
e5623b7
Allow custom executor labels and annotations (#321)
mccheah Jun 1, 2017
5e2b205
Dynamic allocation, cleanup in case of driver death (#319)
foxish Jun 2, 2017
bb1b234
Fix client to await the driver pod (#325)
kimoonkim Jun 2, 2017
e37b0cf
Clean up resources that are not used by pods. (#305)
mccheah Jun 3, 2017
c325691
Copy yaml files when making distribution (#327)
tnachen Jun 4, 2017
d835b6a
Allow docker image pull policy to be configurable (#328)
tnachen Jun 5, 2017
4751371
POM update 0.2.0 (#329)
foxish Jun 5, 2017
5470366
Update tags (#332)
foxish Jun 6, 2017
ca4309f
nicer readme (#333)
foxish Jun 6, 2017
0dd146c
Support specify CPU cores and Memory restricts for driver (#340)
duyanghao Jun 8, 2017
bcf57cf
Generate the application ID label irrespective of app name. (#331)
mccheah Jun 8, 2017
78baf9b
Create base-image and minimize layer count (#324)
johscheuer Jun 8, 2017
2f80b1d
Added log4j config for k8s unit tests. (#314)
lins05 Jun 9, 2017
d4ec136
Use node affinity to launch executors on preferred nodes benefitting …
kimoonkim Jun 14, 2017
d6a3111
Fix sbt build. (#344)
mccheah Jun 14, 2017
fdd50f1
New API for custom labels and annotations. (#346)
mccheah Jun 14, 2017
a6291c6
Allow spark driver find shuffle pods in specified namespace (#357)
Jun 22, 2017
08fe944
Bypass init-containers when possible (#348)
chenchun Jun 23, 2017
8b3248f
Config for hard cpu limit on pods; default unlimited (#356)
Jun 23, 2017
6f6cfd6
Allow number of executor cores to have fractional values (#361)
liyinan926 Jun 29, 2017
befcf0a
Python Bindings for launching PySpark Jobs from the JVM (#364)
ifilonenko Jul 3, 2017
0f4368f
Submission client redesign to use a step-based builder pattern (#365)
mccheah Jul 14, 2017
8c35d81
Add implicit conversions to imports. (#374)
mccheah Jul 17, 2017
db5f5be
Fix import order and scalastyle (#375)
ash211 Jul 17, 2017
8751a9a
fix submit job errors (#376)
Jul 18, 2017
6dbd32e
Add node selectors for driver and executor pods (#355)
Jul 18, 2017
3ec9410
Retry binding server to random port in the resource staging server te…
mccheah Jul 19, 2017
e1ff2f0
set RestartPolicy=Never for executor (#367)
Jul 19, 2017
b1c48f9
Read classpath entries from SPARK_EXTRA_CLASSPATH on executors. (#383)
mccheah Jul 20, 2017
70e4e32
Initial architecture design for HDFS support
ifilonenko Jul 15, 2017
4345752
Minor styling
ifilonenko Jul 15, 2017
1d19f7d
Added proper logic for mounting ConfigMaps
ifilonenko Jul 18, 2017
adf44cf
styling
ifilonenko Jul 18, 2017
8b71168
modified otherKubernetesResource logic
ifilonenko Jul 18, 2017
03ff1ba
fixed Integration tests and modified HADOOP_CONF_DIR variable to be F…
ifilonenko Jul 18, 2017
a6431a0
setting HADOOP_CONF_DIR env variables
ifilonenko Jul 18, 2017
dc8f2eb
Included integration tests for Stage 1
ifilonenko Jul 18, 2017
82e073b
Initial Kerberos support
ifilonenko Jul 19, 2017
3f1c567
initial Stage 2 architecture using deprecated 2.1 methods
ifilonenko Jul 21, 2017
50c8fbf
Added current, BROKEN, integration test environment for review
ifilonenko Jul 26, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/PULL_REQUEST_TEMPLATE
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request.
Please review http://spark.apache.org/contributing.html before opening a pull request.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ project/plugins/project/build.properties
project/plugins/src_managed/
project/plugins/target/
python/lib/pyspark.zip
python/deps
python/pyspark/python
reports/
scalastyle-on-compile.generated.xml
scalastyle-output.xml
Expand Down
22 changes: 17 additions & 5 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,22 @@
sudo: required
dist: trusty

# 2. Choose language and target JDKs for parallel builds.
# 2. Choose language, target JDK and env's for parallel builds.
language: java
jdk:
- oraclejdk7
- oraclejdk8
env: # Used by the install section below.
# Configure the unit test build for spark core and kubernetes modules,
# while excluding some flaky unit tests using a regex pattern.
- PHASE=test \
PROFILES="-Pmesos -Pyarn -Phadoop-2.7 -Pkubernetes" \
MODULES="-pl core,resource-managers/kubernetes/core -am" \
ARGS="-Dtest=none -Dsuffixes='^org\.apache\.spark\.(?!ExternalShuffleServiceSuite|SortShuffleSuite$|rdd\.LocalCheckpointSuite$|deploy\.SparkSubmitSuite$|deploy\.StandaloneDynamicAllocationSuite$).*'"
# Configure the full build.
- PHASE=install \
PROFILES="-Pmesos -Pyarn -Phadoop-2.7 -Pkubernetes -Pkinesis-asl -Phive -Phive-thriftserver" \
MODULES="" \
ARGS="-T 4 -q -DskipTests"

# 3. Setup cache directory for SBT and Maven.
cache:
Expand All @@ -41,11 +52,12 @@ cache:
notifications:
email: false

# 5. Run maven install before running lint-java.
# 5. Run maven build before running lints.
install:
- export MAVEN_SKIP_RC=1
- build/mvn -T 4 -q -DskipTests -Pmesos -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive -Phive-thriftserver install
- build/mvn ${PHASE} ${PROFILES} ${MODULES} ${ARGS}

# 6. Run lint-java.
# 6. Run lints.
script:
- dev/lint-java
- dev/lint-scala
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
## Contributing to Spark

*Before opening a pull request*, review the
[Contributing to Spark wiki](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark).
[Contributing to Spark guide](http://spark.apache.org/contributing.html).
It lists steps that are required before creating a PR. In particular, consider:

- Is the change important and ready enough to ask the community to spend time reviewing?
- Have you searched for existing, related JIRAs and pull requests?
- Is this a new feature that can stand alone as a [third party project](https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects) ?
- Is this a new feature that can stand alone as a [third party project](http://spark.apache.org/third-party-projects.html) ?
- Is the change being proposed clearly explained and motivated?

When you contribute code, you affirm that the contribution is your original work and that you
Expand Down
3 changes: 0 additions & 3 deletions NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -421,9 +421,6 @@ Copyright (c) 2011, Terrence Parr.
This product includes/uses ASM (http://asm.ow2.org/),
Copyright (c) 2000-2007 INRIA, France Telecom.

This product includes/uses org.json (http://www.json.org/java/index.html),
Copyright (c) 2002 JSON.org

This product includes/uses JLine (http://jline.sourceforge.net/),
Copyright (c) 2002-2006, Marc Prud'hommeaux <[email protected]>.

Expand Down
91 changes: 91 additions & 0 deletions R/CRAN_RELEASE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# SparkR CRAN Release

To release SparkR as a package to CRAN, we would use the `devtools` package. Please work with the
`[email protected]` community and R package maintainer on this.

### Release

First, check that the `Version:` field in the `pkg/DESCRIPTION` file is updated. Also, check for stale files not under source control.

Note that while `run-tests.sh` runs `check-cran.sh` (which runs `R CMD check`), it is doing so with `--no-manual --no-vignettes`, which skips a few vignettes or PDF checks - therefore it will be preferred to run `R CMD check` on the source package built manually before uploading a release. Also note that for CRAN checks for pdf vignettes to success, `qpdf` tool must be there (to install it, eg. `yum -q -y install qpdf`).

To upload a release, we would need to update the `cran-comments.md`. This should generally contain the results from running the `check-cran.sh` script along with comments on status of all `WARNING` (should not be any) or `NOTE`. As a part of `check-cran.sh` and the release process, the vignettes is build - make sure `SPARK_HOME` is set and Spark jars are accessible.

Once everything is in place, run in R under the `SPARK_HOME/R` directory:

```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::release(); .libPaths(paths)
```

For more information please refer to http://r-pkgs.had.co.nz/release.html#release-check

### Testing: build package manually

To build package manually such as to inspect the resulting `.tar.gz` file content, we would also use the `devtools` package.

Source package is what get released to CRAN. CRAN would then build platform-specific binary packages from the source package.

#### Build source package

To build source package locally without releasing to CRAN, run in R under the `SPARK_HOME/R` directory:

```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg"); .libPaths(paths)
```

(http://r-pkgs.had.co.nz/vignettes.html#vignette-workflow-2)

Similarly, the source package is also created by `check-cran.sh` with `R CMD build pkg`.

For example, this should be the content of the source package:

```sh
DESCRIPTION R inst tests
NAMESPACE build man vignettes

inst/doc/
sparkr-vignettes.html
sparkr-vignettes.Rmd
sparkr-vignettes.Rman

build/
vignette.rds

man/
*.Rd files...

vignettes/
sparkr-vignettes.Rmd
```

#### Test source package

To install, run this:

```sh
R CMD INSTALL SparkR_2.1.0.tar.gz
```

With "2.1.0" replaced with the version of SparkR.

This command installs SparkR to the default libPaths. Once that is done, you should be able to start R and run:

```R
library(SparkR)
vignette("sparkr-vignettes", package="SparkR")
```

#### Build binary package

To build binary package locally, run in R under the `SPARK_HOME/R` directory:

```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg", binary = TRUE); .libPaths(paths)
```

For example, this should be the content of the binary package:

```sh
DESCRIPTION Meta R html tests
INDEX NAMESPACE help profile worker
```
10 changes: 5 additions & 5 deletions R/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ SparkR is an R package that provides a light-weight frontend to use Spark from R

Libraries of sparkR need to be created in `$SPARK_HOME/R/lib`. This can be done by running the script `$SPARK_HOME/R/install-dev.sh`.
By default the above script uses the system wide installation of R. However, this can be changed to any user installed location of R by setting the environment variable `R_HOME` the full path of the base directory where R is installed, before running install-dev.sh script.
Example:
Example:
```bash
# where /home/username/R is where R is installed and /home/username/R/bin contains the files R and RScript
export R_HOME=/home/username/R
Expand Down Expand Up @@ -46,19 +46,19 @@ Sys.setenv(SPARK_HOME="/Users/username/spark")
# This line loads SparkR from the installed directory
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
sc <- sparkR.init(master="local")
sparkR.session()
```

#### Making changes to SparkR

The [instructions](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) for making contributions to Spark also apply to SparkR.
The [instructions](http://spark.apache.org/contributing.html) for making contributions to Spark also apply to SparkR.
If you only make R file changes (i.e. no Scala changes) then you can just re-install the R package using `R/install-dev.sh` and test your changes.
Once you have made your changes, please include unit tests for them and run existing unit tests using the `R/run-tests.sh` script as described below.

#### Generating documentation

The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script. Also, you may need to install these [prerequisites](https://github.com/apache/spark/tree/master/docs#prerequisites). See also, `R/DOCUMENTATION.md`

### Examples, Unit tests

SparkR comes with several sample programs in the `examples/src/main/r` directory.
Expand Down
50 changes: 44 additions & 6 deletions R/check-cran.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,30 @@ if [ ! -z "$R_HOME" ]
fi
R_SCRIPT_PATH="$(dirname $(which R))"
fi
echo "USING R_HOME = $R_HOME"
echo "Using R_SCRIPT_PATH = ${R_SCRIPT_PATH}"

# Build the latest docs
# Install the package (this is required for code in vignettes to run when building it later)
# Build the latest docs, but not vignettes, which is built with the package next
$FWDIR/create-docs.sh

# Build a zip file containing the source package
"$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg
# Build source package with vignettes
SPARK_HOME="$(cd "${FWDIR}"/..; pwd)"
. "${SPARK_HOME}"/bin/load-spark-env.sh
if [ -f "${SPARK_HOME}/RELEASE" ]; then
SPARK_JARS_DIR="${SPARK_HOME}/jars"
else
SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
fi

if [ -d "$SPARK_JARS_DIR" ]; then
# Build a zip file containing the source package with vignettes
SPARK_HOME="${SPARK_HOME}" "$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg

find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf' -not -name '*.html' -delete
else
echo "Error Spark JARs not found in $SPARK_HOME"
exit 1
fi

# Run check as-cran.
VERSION=`grep Version $FWDIR/pkg/DESCRIPTION | awk '{print $NF}'`
Expand All @@ -54,11 +71,32 @@ fi

if [ -n "$NO_MANUAL" ]
then
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual"
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual --no-vignettes"
fi

echo "Running CRAN check with $CRAN_CHECK_OPTIONS options"

"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
if [ -n "$NO_TESTS" ] && [ -n "$NO_MANUAL" ]
then
"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
else
# This will run tests and/or build vignettes, and require SPARK_HOME
SPARK_HOME="${SPARK_HOME}" "$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
fi

# Install source package to get it to generate vignettes rds files, etc.
if [ -n "$CLEAN_INSTALL" ]
then
echo "Removing lib path and installing from source package"
LIB_DIR="$FWDIR/lib"
rm -rf $LIB_DIR
mkdir -p $LIB_DIR
"$R_SCRIPT_PATH/"R CMD INSTALL SparkR_"$VERSION".tar.gz --library=$LIB_DIR

# Zip the SparkR package so that it can be distributed to worker nodes on YARN
pushd $LIB_DIR > /dev/null
jar cfM "$LIB_DIR/sparkr.zip" SparkR
popd > /dev/null
fi

popd > /dev/null
19 changes: 1 addition & 18 deletions R/create-docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
# Script to create API docs and vignettes for SparkR
# This requires `devtools`, `knitr` and `rmarkdown` to be installed on the machine.

# After running this script the html docs can be found in
# After running this script the html docs can be found in
# $SPARK_HOME/R/pkg/html
# The vignettes can be found in
# $SPARK_HOME/R/pkg/vignettes/sparkr_vignettes.html
Expand Down Expand Up @@ -52,21 +52,4 @@ Rscript -e 'libDir <- "../../lib"; library(SparkR, lib.loc=libDir); library(knit

popd

# Find Spark jars.
if [ -f "${SPARK_HOME}/RELEASE" ]; then
SPARK_JARS_DIR="${SPARK_HOME}/jars"
else
SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
fi

# Only create vignettes if Spark JARs exist
if [ -d "$SPARK_JARS_DIR" ]; then
# render creates SparkR vignettes
Rscript -e 'library(rmarkdown); paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); render("pkg/vignettes/sparkr-vignettes.Rmd"); .libPaths(paths)'

find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf' -not -name '*.html' -delete
else
echo "Skipping R vignettes as Spark JARs not found in $SPARK_HOME"
fi

popd
2 changes: 1 addition & 1 deletion R/install-dev.sh
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ if [ ! -z "$R_HOME" ]
fi
R_SCRIPT_PATH="$(dirname $(which R))"
fi
echo "USING R_HOME = $R_HOME"
echo "Using R_SCRIPT_PATH = ${R_SCRIPT_PATH}"

# Generate Rd files if devtools is installed
"$R_SCRIPT_PATH/"Rscript -e ' if("devtools" %in% rownames(installed.packages())) { library(devtools); devtools::document(pkg="./pkg", roclets=c("rd")) }'
Expand Down
3 changes: 3 additions & 0 deletions R/pkg/.Rbuildignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
^.*\.Rproj$
^\.Rproj\.user$
^\.lintr$
^cran-comments\.md$
^NEWS\.md$
^README\.Rmd$
^src-native$
^html$
12 changes: 7 additions & 5 deletions R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,26 +1,27 @@
Package: SparkR
Type: Package
Version: 2.1.0
Title: R Frontend for Apache Spark
Version: 2.0.0
Date: 2016-08-27
Description: The SparkR package provides an R Frontend for Apache Spark.
Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
email = "[email protected]"),
person("Xiangrui", "Meng", role = "aut",
email = "[email protected]"),
person("Felix", "Cheung", role = "aut",
email = "[email protected]"),
person(family = "The Apache Software Foundation", role = c("aut", "cph")))
License: Apache License (== 2.0)
URL: http://www.apache.org/ http://spark.apache.org/
BugReports: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingBugReports
BugReports: http://spark.apache.org/contributing.html
Depends:
R (>= 3.0),
methods
Suggests:
knitr,
rmarkdown,
testthat,
e1071,
survival
Description: The SparkR package provides an R frontend for Apache Spark.
License: Apache License (== 2.0)
Collate:
'schema.R'
'generics.R'
Expand Down Expand Up @@ -48,3 +49,4 @@ Collate:
'utils.R'
'window.R'
RoxygenNote: 5.0.1
VignetteBuilder: knitr
Loading