diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html index c00d0db63cd10..3c786a6344066 100755 --- a/docs/_layouts/global.html +++ b/docs/_layouts/global.html @@ -99,6 +99,7 @@
  • Spark Standalone
  • Mesos
  • YARN
  • +
  • Kubernetes
  • diff --git a/docs/index.md b/docs/index.md index 57b9fa848f4a3..81d37aa5f63a1 100644 --- a/docs/index.md +++ b/docs/index.md @@ -113,6 +113,7 @@ options for deployment: * [Mesos](running-on-mesos.html): deploy a private cluster using [Apache Mesos](http://mesos.apache.org) * [YARN](running-on-yarn.html): deploy Spark on top of Hadoop NextGen (YARN) + * [Kubernetes](running-on-kubernetes.html): deploy Spark on top of Kubernetes **Other Documents:** diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md new file mode 100644 index 0000000000000..5192d9d086618 --- /dev/null +++ b/docs/running-on-kubernetes.md @@ -0,0 +1,224 @@ +--- +layout: global +title: Running Spark on Kubernetes +--- + +Support for running on [Kubernetes](https://kubernetes.io/) is available in experimental status. The feature set is +currently limited and not well-tested. This should not be used in production environments. + +## Setting Up Docker Images + +Kubernetes requires users to supply images that can be deployed into containers within pods. The images are built to +be run in a container runtime environment that Kubernetes supports. Docker is a container runtime environment that is +frequently used with Kubernetes, so Spark provides some support for working with Docker to get started quickly. + +To use Spark on Kubernetes with Docker, images for the driver and the executors need to built and published to an +accessible Docker registry. Spark distributions include the Docker files for the driver and the executor at +`dockerfiles/driver/Dockerfile` and `docker/executor/Dockerfile`, respectively. Use these Docker files to build the +Docker images, and then tag them with the registry that the images should be sent to. Finally, push the images to the +registry. + +For example, if the registry host is `registry-host` and the registry is listening on port 5000: + + cd $SPARK_HOME + docker build -t registry-host:5000/spark-driver:latest -f dockerfiles/driver/Dockerfile . + docker build -t registry-host:5000/spark-executor:latest -f dockerfiles/executor/Dockerfile . + docker push registry-host:5000/spark-driver:latest + docker push registry-host:5000/spark-executor:latest + +## Submitting Applications to Kubernetes + +Kubernetes applications can be executed via `spark-submit`. For example, to compute the value of pi, assuming the images +are set up as described above: + + bin/spark-submit + --deploy-mode cluster + --class org.apache.spark.examples.SparkPi + --master k8s://https://: + --kubernetes-namespace default + --conf spark.executor.instances=5 + --conf spark.app.name=spark-pi + --conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver:latest + --conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest + examples/jars/spark_2.11-2.2.0.jar + + +The Spark master, specified either via passing the `--master` command line argument to `spark-submit` or by setting +`spark.master` in the application's configuration, must be a URL with the format `k8s://`. Prefixing the +master string with `k8s://` will cause the Spark application to launch on the Kubernetes cluster, with the API server +being contacted at `api_server_url`. The HTTP protocol must also be specified. + +Note that applications can currently only be executed in cluster mode, where the driver and its executors are running on +the cluster. + +### Adding Other JARs + +Spark allows users to provide dependencies that are bundled into the driver's Docker image, or that are on the local +disk of the submitter's machine. These two types of dependencies are specified via different configuration options to +`spark-submit`: + +* Local jars provided by specifying the `--jars` command line argument to `spark-submit`, or by setting `spark.jars` in + the application's configuration, will be treated as jars that are located on the *disk of the driver Docker + container*. This only applies to jar paths that do not specify a scheme or that have the scheme `file://`. Paths with + other schemes are fetched from their appropriate locations. +* Local jars provided by specifying the `--upload-jars` command line argument to `spark-submit`, or by setting + `spark.kubernetes.driver.uploads.jars` in the application's configuration, will be treated as jars that are located on + the *disk of the submitting machine*. These jars are uploaded to the driver docker container before executing the + application. + +* A main application resource path that does not have a scheme or that has the scheme `file://` is assumed to be on the + *disk of the submitting machine*. This resource is uploaded to the driver docker container before executing the + application. A remote path can still be specified and the resource will be fetched from the appropriate location. + +In all of these cases, the jars are placed on the driver's classpath, and are also sent to the executors. Below are some +examples of providing application dependencies. + +To submit an application with both the main resource and two other jars living on the submitting user's machine: + + bin/spark-submit + --deploy-mode cluster + --class com.example.applications.SampleApplication + --master k8s://https://192.168.99.100 + --kubernetes-namespace default + --upload-jars /home/exampleuser/exampleapplication/dep1.jar,/home/exampleuser/exampleapplication/dep2.jar + --conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver:latest + --conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest + /home/exampleuser/exampleapplication/main.jar + +Note that since passing the jars through the `--upload-jars` command line argument is equivalent to setting the +`spark.kubernetes.driver.uploads.jars` Spark property, the above will behave identically to this command: + + bin/spark-submit + --deploy-mode cluster + --class com.example.applications.SampleApplication + --master k8s://https://192.168.99.100 + --kubernetes-namespace default + --conf spark.kubernetes.driver.uploads.jars=/home/exampleuser/exampleapplication/dep1.jar,/home/exampleuser/exampleapplication/dep2.jar + --conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver:latest + --conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest + /home/exampleuser/exampleapplication/main.jar + +To specify a main application resource that can be downloaded from an HTTP service, and if a plugin for that application +is located in the jar `/opt/spark-plugins/app-plugin.jar` on the docker image's disk: + + bin/spark-submit + --deploy-mode cluster + --class com.example.applications.PluggableApplication + --master k8s://https://192.168.99.100 + --kubernetes-namespace default + --jars /opt/spark-plugins/app-plugin.jar + --conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver-custom:latest + --conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest + http://example.com:8080/applications/sparkpluggable/app.jar + +Note that since passing the jars through the `--jars` command line argument is equivalent to setting the `spark.jars` +Spark property, the above will behave identically to this command: + + bin/spark-submit + --deploy-mode cluster + --class com.example.applications.PluggableApplication + --master k8s://https://192.168.99.100 + --kubernetes-namespace default + --conf spark.jars=file:///opt/spark-plugins/app-plugin.jar + --conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver-custom:latest + --conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest + http://example.com:8080/applications/sparkpluggable/app.jar + +### Spark Properties + +Below are some other common properties that are specific to Kubernetes. Most of the other configurations are the same +from the other deployment modes. See the [configuration page](configuration.html) for more information on those. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Property NameDefaultMeaning
    spark.kubernetes.namespace(none) + The namespace that will be used for running the driver and executor pods. Must be specified. When using + spark-submit in cluster mode, this can also be passed to spark-submit via the + --kubernetes-namespace command line argument. +
    spark.kubernetes.driver.docker.imagespark-driver:2.2.0 + Docker image to use for the driver. Specify this using the standard + Docker tag format. +
    spark.kubernetes.executor.docker.imagespark-executor:2.2.0 + Docker image to use for the executors. Specify this using the standard + Docker tag format. +
    spark.kubernetes.submit.caCertFile(none) + CA cert file for connecting to Kubernetes over SSL. This file should be located on the submitting machine's disk. +
    spark.kubernetes.submit.clientKeyFile(none) + Client key file for authenticating against the Kubernetes API server. This file should be located on the submitting + machine's disk. +
    spark.kubernetes.submit.clientCertFile(none) + Client cert file for authenticating against the Kubernetes API server. This file should be located on the submitting + machine's disk. +
    spark.kubernetes.submit.serviceAccountNamedefault + Service account that is used when running the driver pod. The driver pod uses this service account when requesting + executor pods from the API server. +
    spark.kubernetes.driver.uploads.jars(none) + Comma-separated list of jars to sent to the driver and all executors when submitting the application in cluster + mode. Refer to adding other jars for more information. +
    spark.kubernetes.driver.uploads.driverExtraClasspath(none) + Comma-separated list of jars to be sent to the driver only when submitting the application in cluster mode. +
    spark.kubernetes.executor.memoryOverheadexecutorMemory * 0.10, with minimum of 384 + The amount of off-heap memory (in megabytes) to be allocated per executor. This is memory that accounts for things + like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size + (typically 6-10%). +
    + +## Current Limitations + +Running Spark on Kubernetes is currently an experimental feature. Some restrictions on the current implementation that +should be lifted in the future include: +* Applications can only use a fixed number of executors. Dynamic allocation is not supported. +* Applications can only run in cluster mode. +* Only Scala and Java applications can be run.