Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a Docker image generator gradle plugin for Spark applications #381

Merged
merged 30 commits into from
Aug 31, 2018
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
42988ef
Implement a Docker image generator gradle plugin for Spark applications.
mccheah Jun 8, 2018
4b32aa9
Fix circle
mccheah Jun 9, 2018
6bded05
Fix more circle
mccheah Jun 9, 2018
2bb1702
No need to setup docker for compilation only
mccheah Jun 9, 2018
cad9d41
Fix gradle
mccheah Jun 9, 2018
92139c0
Add license headers
mccheah Jun 14, 2018
5a847bb
Merge remote-tracking branch 'palantir/master' into add-docker-image-…
mccheah Jun 15, 2018
46f23c9
Address comments.
mccheah Jun 25, 2018
71b835a
Merge remote-tracking branch 'palantir/master' into add-docker-image-…
mccheah Jun 25, 2018
4149dfd
Remove extra task
mccheah Jun 25, 2018
814978f
Remove 2.11
mccheah Jun 25, 2018
e123548
Remove extra script
mccheah Jun 25, 2018
6bf957a
Remove some imports
mccheah Jun 25, 2018
3194cd1
Add more tests
mccheah Jun 26, 2018
53c0b1a
Don't bundle resources in tgz, just include individual files in resou…
mccheah Jul 18, 2018
d046ab7
Merge remote-tracking branch 'palantir/master' into add-docker-image-…
mccheah Jul 18, 2018
186b3a3
Fix license placement.
mccheah Jul 18, 2018
0f6be98
Use InputFile and not Input
mccheah Jul 18, 2018
6352a26
Fix build
mccheah Jul 18, 2018
1fa699c
Add back K8s integration tests
mccheah Jul 19, 2018
de403f0
Revert changes to SparkBuild
mccheah Jul 19, 2018
cad1bb1
Remove hive version 2.0.2 from test suite
mccheah Jul 19, 2018
1ae4f61
Use shared Spark session for unsafe row suite
mccheah Jul 19, 2018
9ea102a
Merge remote-tracking branch 'palantir/master' into add-docker-image-…
mccheah Jul 23, 2018
94c6c55
Revert "Use shared Spark session for unsafe row suite"
mccheah Jul 23, 2018
999b5f8
Address comments.
mccheah Aug 17, 2018
6704dd5
Fix build script
mccheah Aug 17, 2018
e2c862d
Fix build again
mccheah Aug 17, 2018
3017341
Fix licenses and ignore build dir licenses
mccheah Aug 17, 2018
fb57b7e
Remove extraneous scripts
mccheah Aug 31, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,11 @@ jobs:
- save_cache:
key: build-maven-{{ .Branch }}-{{ .BuildNum }}
paths: .
build-spark-docker-gradle-plugin:
<<: *defaults
steps:
- *checkout-code
- run: dev/build-spark-docker-gradle-plugin.sh

run-style-tests:
# depends only on build-maven
Expand Down Expand Up @@ -252,6 +257,15 @@ jobs:
- 'examples/target/scala-*/jars'
- 'external/*/target/scala-*/*.jar'

run-spark-docker-gradle-plugin-tests:
<<: *test-defaults
resource_class: small
steps:
- *checkout-code
- setup_remote_docker
- run: |
dev/run-spark-docker-gradle-plugin-tests.sh | tee /tmp/run-spark-docker-gradle-plugin-tests.log

run-backcompat-tests:
# depends on build-sbt
<<: *defaults
Expand Down Expand Up @@ -471,6 +485,8 @@ workflows:
<<: *all-branches-and-tags
- build-sbt:
<<: *all-branches-and-tags
- build-spark-docker-gradle-plugin:
<<: *all-branches-and-tags
- run-backcompat-tests:
requires:
- build-sbt
Expand All @@ -479,6 +495,10 @@ workflows:
requires:
- build-sbt
<<: *all-branches-and-tags
- run-spark-docker-gradle-plugin-tests:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to wire publishing as well

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be included because we put the gradle publish call in publish_functions.sh

requires:
- build-spark-docker-gradle-plugin
<<: *all-branches-and-tags
- run-python-tests:
requires:
- build-sbt
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
.ensime_cache/
.ensime_lucene
.generated-mima*
.gradle
.idea/
.idea_modules/
.project
Expand Down
23 changes: 23 additions & 0 deletions dev/build-spark-docker-gradle-plugin.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

set -euo pipefail
ROOT=$(git rev-parse --show-toplevel)
DOCKER_PLUGIN_PROJECT_DIR=$ROOT/resource-managers/kubernetes/docker
$DOCKER_PLUGIN_PROJECT_DIR/gradlew -p $DOCKER_PLUGIN_PROJECT_DIR --info compileJava compileTestJava
2 changes: 2 additions & 0 deletions dev/publish_functions.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ publish_artifacts() {
echo "</server></servers></settings>" >> $tmp_settings

./build/mvn -T 1C --settings $tmp_settings -DskipTests "${PALANTIR_FLAGS[@]}" deploy
DOCKER_PLUGIN_PROJECT_DIR=resource-managers/kubernetes/docker
$DOCKER_PLUGIN_PROJECT_DIR/gradlew -p $DOCKER_PLUGIN_PROJECT_DIR --info bintrayUpload
}

make_dist() {
Expand Down
22 changes: 22 additions & 0 deletions dev/run-spark-docker-gradle-plugin-tests.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
set -euo pipefail
ROOT=$(git rev-parse --show-toplevel)
DOCKER_PLUGIN_PROJECT_DIR=$ROOT/resource-managers/kubernetes/docker
$DOCKER_PLUGIN_PROJECT_DIR/gradlew -p $DOCKER_PLUGIN_PROJECT_DIR --info test

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add gradle to root and only define one subproject with that path so we can just run ./gradlew test don't see any benefit nesting it so far and keeping it constrained

4 changes: 4 additions & 0 deletions resource-managers/kubernetes/docker/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
src/main/resources/docker-resources
.gradle
build

109 changes: 109 additions & 0 deletions resource-managers/kubernetes/docker/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

import java.nio.charset.StandardCharsets

buildscript {
repositories {
jcenter()
maven { url "http://palantir.bintray.com/releases" }
}

dependencies {
classpath 'gradle.plugin.com.palantir:gradle-circle-style:1.1.2'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use baseline?

classpath 'com.jfrog.bintray.gradle:gradle-bintray-plugin:1.7.3'
classpath 'com.netflix.nebula:nebula-dependency-recommender:5.2.0'
classpath 'com.netflix.nebula:nebula-publishing-plugin:5.1.5'
}
}

plugins {
id 'com.palantir.git-version' version '0.9.1'
id 'java-gradle-plugin'
}

repositories {
jcenter()
maven { url "http://palantir.bintray.com/releases" }
}

apply plugin: 'java'
apply plugin: 'idea'
apply plugin: 'nebula.dependency-recommender'
version System.env.CIRCLE_TAG ?: gitVersion()
group 'org.apache.spark'

sourceCompatibility = 1.8

dependencyRecommendations {
strategy OverrideTransitives
propertiesFile file: project.rootProject.file('versions.props')
}

test {
minHeapSize = "512m"
maxHeapSize = "512m"
}

dependencies {
compileOnly gradleApi()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it compile - this is a weird dependency

compile 'org.apache.commons:commons-compress'
compile 'commons-io:commons-io'
testCompile 'org.assertj:assertj-core'
testCompile 'org.mockito:mockito-core'
testCompile 'junit:junit'
testCompile 'com.spotify:docker-client'
testCompile gradleApi()
testCompile gradleTestKit()
}

def getGitRepoRoot = { ->

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a closure whether you add -> or not, can remove it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function isn't in the latest version.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be necessary - let's make root project be in the root of the repo

def bashStdOut = new ByteArrayOutputStream()
exec {
commandLine 'git', 'rev-parse', '--show-toplevel'
standardOutput = bashStdOut
}
return new String(bashStdOut.toByteArray(), StandardCharsets.UTF_8).trim()
}


task prepareDockerBundleDir(type: Sync) {
from('src/main/dockerfiles/spark/Dockerfile') {
into 'kubernetes/dockerfiles/spark'
rename 'Dockerfile', 'Dockerfile.original'
}

from('src/main/dockerfiles/spark/entrypoint.sh') {
into 'kubernetes/dockerfiles/spark'
}

from(fileTree("$getGitRepoRoot/bin")) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just use a relative path here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this doesn't apply anymore, given how we've rearranged the Gradle project.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We instead use rootDir instead of getGitRepoRoot. if we were to use relative paths, we'd have to refer to the parent path, e.g. from(fileTree("../bin")). I'm generally against referencing the parent path this way because it requires one to know where the script is running from. Referencing the absolute path gives us a path that's consistent regardless of where the script is running from or if we modify the directory structure of this project later.

into 'bin'
}

from(fileTree("$getGitRepoRoot/sbin")) {
into 'sbin'
}

into file("src/main/resources/docker-resources")
includeEmptyDirs = false
}

tasks.compileJava.dependsOn tasks.prepareDockerBundleDir
tasks.idea.dependsOn tasks.prepareDockerBundleDir

apply from: "${rootDir}/gradle/publish.gradle"
56 changes: 56 additions & 0 deletions resource-managers/kubernetes/docker/gradle/publish.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

apply plugin: 'com.jfrog.bintray'
apply plugin: 'nebula.maven-base-publish'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use nebula.maven-publish and remove line below as well as bintrayupload dependencies and publications. Just make publish depend on bintrayupload (see baseline for reference)

apply plugin: 'nebula.maven-resolved-dependencies'
apply plugin: 'nebula.javadoc-jar'
apply plugin: 'nebula.source-jar'

jar {
manifest {
attributes(
"Implementation-Title" : project.name,
"Implementation-Version" : project.version,
"Implementation-Vendor" : "Palantir Technologies Inc.")
}
}

bintray {
user = System.env.BINTRAY_USERNAME
key = System.env.BINTRAY_PASSWORD
publish = true
pkg {
repo = 'releases'
name = 'spark'
userOrg = 'palantir'
licenses = ['Apache-2.0']
publications = ['nebula']
}
}

bintrayUpload.dependsOn { generatePomFileForNebulaPublication }
bintrayUpload.dependsOn { sourceJar }
bintrayUpload.dependsOn { build }

publishing {
publications {
nebula(MavenPublication) {
from components.java
}
}
}
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

distributionBase=GRADLE_USER_HOME
distributionPath=wrapper/dists
distributionUrl=https\://services.gradle.org/distributions/gradle-4.7-bin.zip

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latest

zipStoreBase=GRADLE_USER_HOME
zipStorePath=wrapper/dists
Loading