Skip to content

Commit

Permalink
SPARK-3883: Added SSL setup documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
jacek-lewandowski committed Feb 2, 2015
1 parent 2532668 commit fb31b49
Show file tree
Hide file tree
Showing 4 changed files with 188 additions and 17 deletions.
60 changes: 48 additions & 12 deletions core/src/main/scala/org/apache/spark/SSLOptions.scala
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,23 @@ import java.io.File
import com.typesafe.config.{Config, ConfigFactory, ConfigValueFactory}
import org.eclipse.jetty.util.ssl.SslContextFactory

/** SSLOptions class is a common container for SSL configuration options. It offers methods to
* generate specific objects to configure SSL for different communication protocols.
*
* SSLOptions is intended to provide the maximum common set of SSL settings, which are supported
* by the protocol, which it can generate the configuration for. Since Akka doesn't support client
* authentication with SSL, SSLOptions cannot support it either.
*
* @param enabled enables or disables SSL; if it is set to false, the rest of the
* settings are disregarded
* @param keyStore a path to the key-store file
* @param keyStorePassword a password to access the key-store file
* @param keyPassword a password to access the private key in the key-store
* @param trustStore a path to the trust-store file
* @param trustStorePassword a password to access the trust-store file
* @param protocol SSL protocol (remember that SSLv3 was compromised) supported by Java
* @param enabledAlgorithms a set of encryption algorithms to use
*/
private[spark] case class SSLOptions(
enabled: Boolean = false,
keyStore: Option[File] = None,
Expand All @@ -32,9 +49,8 @@ private[spark] case class SSLOptions(
protocol: Option[String] = None,
enabledAlgorithms: Set[String] = Set.empty) {

/**
* Creates a Jetty SSL context factory according to the SSL settings represented by this object.
*/
/** Creates a Jetty SSL context factory according to the SSL settings represented by this object.
*/
def createJettySslContextFactory(): Option[SslContextFactory] = {
if (enabled) {
val sslContextFactory = new SslContextFactory()
Expand All @@ -53,10 +69,9 @@ private[spark] case class SSLOptions(
}
}

/**
* Creates an Akka configuration object which contains all the SSL settings represented by this
* object. It can be used then to compose the ultimate Akka configuration.
*/
/** Creates an Akka configuration object which contains all the SSL settings represented by this
* object. It can be used then to compose the ultimate Akka configuration.
*/
def createAkkaConfig: Option[Config] = {
import scala.collection.JavaConversions._
if (enabled) {
Expand Down Expand Up @@ -84,6 +99,7 @@ private[spark] case class SSLOptions(
}
}

/** Returns a string representation of this SSLOptions with all the passwords masked. */
override def toString: String = s"SSLOptions{enabled=$enabled, " +
s"keyStore=$keyStore, keyStorePassword=${keyStorePassword.map(_ => "xxx")}, " +
s"trustStore=$trustStore, trustStorePassword=${trustStorePassword.map(_ => "xxx")}, " +
Expand All @@ -93,11 +109,31 @@ private[spark] case class SSLOptions(

private[spark] object SSLOptions extends Logging {

/**
* Resolves SSLOptions settings from a given Spark configuration object at a given namespace.
* The parent directory of that location is used as a base directory to resolve relative paths
* to keystore and truststore.
*/
/** Resolves SSLOptions settings from a given Spark configuration object at a given namespace.
*
* The following settings are allowed:
* $ - `[ns].enabled` - `true` or `false`, to enable or disable SSL respectively
* $ - `[ns].keyStore` - a path to the key-store file; can be relative to the current directory
* $ - `[ns].keyStorePassword` - a password to the key-store file
* $ - `[ns].keyPassword` - a password to the private key
* $ - `[ns].trustStore` - a path to the trust-store file; can be relative to the current
* directory
* $ - `[ns].trustStorePassword` - a password to the trust-store file
* $ - `[ns].protocol` - a protocol name supported by a particular Java version
* $ - `[ns].enabledAlgorithms` - a comma separated list of ciphers
*
* For a list of protocols and ciphers supported by particular Java versions, you may go to
* [[https://blogs.oracle.com/java-platform-group/entry/diagnosing_tls_ssl_and_https Oracle
* blog page]].
*
* You can optionally specify the default configuration. If you do, for each setting which is
* missing in SparkConf, the corresponding setting is used from the default configuration.
*
* @param conf Spark configuration object where the settings are collected from
* @param ns the namespace name
* @param defaults the default configuration
* @return [[org.apache.spark.SSLOptions]] object
*/
def parse(conf: SparkConf, ns: String, defaults: Option[SSLOptions] = None): SSLOptions = {
val enabled = conf.getBoolean(s"$ns.enabled", defaultValue = defaults.exists(_.enabled))

Expand Down
41 changes: 36 additions & 5 deletions core/src/main/scala/org/apache/spark/SecurityManager.scala
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ import org.apache.spark.network.sasl.SecretKeyHolder
* Spark also has a set of admin acls (`spark.admin.acls`) which is a set of users/administrators
* who always have permission to view or modify the Spark application.
*
* Spark does not currently support encryption after authentication.
* Starting from version 1.3, Spark has partial support for encrypted connections with SSL.
*
* At this point spark has multiple communication protocols that need to be secured and
* different underlying mechanisms are used depending on the protocol:
Expand All @@ -71,8 +71,9 @@ import org.apache.spark.network.sasl.SecretKeyHolder
* to connect to the server. There is no control of the underlying
* authentication mechanism so its not clear if the password is passed in
* plaintext or uses DIGEST-MD5 or some other mechanism.
* Akka also has an option to turn on SSL, this option is not currently supported
* but we could add a configuration option in the future.
*
* Akka also has an option to turn on SSL, this option is currently supported (see
* the details below).
*
* - HTTP for broadcast and file server (via HttpServer) -> Spark currently uses Jetty
* for the HttpServer. Jetty supports multiple authentication mechanisms -
Expand All @@ -81,8 +82,9 @@ import org.apache.spark.network.sasl.SecretKeyHolder
* to authenticate using DIGEST-MD5 via a single user and the shared secret.
* Since we are using DIGEST-MD5, the shared secret is not passed on the wire
* in plaintext.
* We currently do not support SSL (https), but Jetty can be configured to use it
* so we could add a configuration option for this in the future.
*
* We currently support SSL (https) for this communication protocol (see the details
* below).
*
* The Spark HttpServer installs the HashLoginServer and configures it to DIGEST-MD5.
* Any clients must specify the user and password. There is a default
Expand Down Expand Up @@ -146,6 +148,35 @@ import org.apache.spark.network.sasl.SecretKeyHolder
* authentication. Spark will then use that user to compare against the view acls to do
* authorization. If not filter is in place the user is generally null and no authorization
* can take place.
*
* Connection encryption (SSL) configuration is organized hierarchically. The user can configure
* the default SSL settings which will be used for all the supported communication protocols unless
* they are overwritten by protocol specific settings. This way the user can easily provide the
* common settings for all the protocols without disabling the ability to configure each one
* individually.
*
* All the SSL settings like `spark.ssl.xxx` where `xxx` is a particular configuration property,
* denote the global configuration for all the supported protocols. In order to override the global
* configuration for the particular protocol, the properties must be overwritten in the
* protocol-specific namespace. Use `spark.ssl.yyy.xxx` settings to overwrite the global
* configuration for particular protocol denoted by `yyy`. Currently `yyy` can be either `akka` for
* Akka based connections or `fs` for broadcast and file server.
*
* Refer to [[org.apache.spark.SSLOptions]] documentation for the list of
* options that can be specified.
*
* SecurityManager initializes SSLOptions objects for different protocols separately. SSLOptions
* object parses Spark configuration at a given namespace and builds the common representation
* of SSL settings. SSLOptions is the used to provide protocol-specific configuration like TypeSafe
* configuration for Akka or SSLContextFactory for Jetty.
* SSL must be configured on each node and configured for each component involved in
* communication using the particular protocol. In YARN clusters, the key-store can be prepared on
* the client side then distributed and used by the executors as the part of the application
* (YARN allows the user to deploy files before the application is started).
* In standalone deployment, the user needs to provide key-stores and configuration
* options for master and workers. In this mode, the user may allow the executors to use the SSL
* settings inherited from the worker which spawned that executor. It can be accomplished by
* setting `spark.ssl.useNodeLocalConf` to `true`.
*/

private[spark] class SecurityManager(sparkConf: SparkConf)
Expand Down
80 changes: 80 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -1234,6 +1234,86 @@ Apart from these, the following properties are also available, and may be useful
</tr>
</table>

#### Encryption

<table class="table">
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
<tr>
<td><code>spark.ssl.enabled</code></td>
<td>false</td>
<td>
<p>Whether to enable SSL connections on all supported protocols.</p>

<p>All the SSL settings like <code>spark.ssl.xxx</code> where <code>xxx</code> is a
particular configuration property, denote the global configuration for all the supported
protocols. In order to override the global configuration for the particular protocol,
the properties must be overwritten in the protocol-specific namespace.</p>

<p>Use <code>spark.ssl.YYY.XXX</code> settings to overwrite the global configuration for
particular protocol denoted by <code>YYY</code>. Currently <code>YYY</code> can be
either <code>akka</code> for Akka based connections or <code>fs</code> for broadcast and
file server.</p>
</td>
</tr>
<tr>
<td><code>spark.ssl.keyStore</code></td>
<td>None</td>
<td>
A path to a key-store file. The path can be absolute or relative to the directory where
the component is started in.
</td>
</tr>
<tr>
<td><code>spark.ssl.keyStorePassword</code></td>
<td>None</td>
<td>
A password to the key-store.
</td>
</tr>
<tr>
<td><code>spark.ssl.keyPassword</code></td>
<td>None</td>
<td>
A password to the private key in key-store.
</td>
</tr>
<tr>
<td><code>spark.ssl.trustStore</code></td>
<td>None</td>
<td>
A path to a trust-store file. The path can be absolute or relative to the directory
where the component is started in.
</td>
</tr>
<tr>
<td><code>spark.ssl.trustStorePassword</code></td>
<td>None</td>
<td>
A password to the trust-store.
</td>
</tr>
<tr>
<td><code>spark.ssl.protocol</code></td>
<td>None</td>
<td>
A protocol name. The protocol must be supported by JVM. The reference list of protocols
one can find on <a href="https://blogs.oracle.com/java-platform-group/entry/diagnosing_tls_ssl_and_https">this</a>
page.
</td>
</tr>
<tr>
<td><code>spark.ssl.enabledAlgorithms</code></td>
<td>Empty</td>
<td>
A comma separated list of ciphers. The specified ciphers must be supported by JVM.
The reference list of protocols one can find on
<a href="https://blogs.oracle.com/java-platform-group/entry/diagnosing_tls_ssl_and_https">this</a>
page.
</td>
</tr>
</table>


#### Spark Streaming
<table class="table">
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
Expand Down
24 changes: 24 additions & 0 deletions docs/security.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,30 @@ Spark allows for a set of administrators to be specified in the acls who always

If your applications are using event logging, the directory where the event logs go (`spark.eventLog.dir`) should be manually created and have the proper permissions set on it. If you want those log files secured, the permissions should be set to `drwxrwxrwxt` for that directory. The owner of the directory should be the super user who is running the history server and the group permissions should be restricted to super user group. This will allow all users to write to the directory but will prevent unprivileged users from removing or renaming a file unless they own the file or directory. The event log files will be created by Spark with permissions such that only the user and group have read and write access.

## Encryption

Spark supports SSL for Akka and HTTP (for broadcast and file server) protocols. However SSL is not supported yet for WebUI and block transfer service.

Connection encryption (SSL) configuration is organized hierarchically. The user can configure the default SSL settings which will be used for all the supported communication protocols unless they are overwritten by protocol-specific settings. This way the user can easily provide the common settings for all the protocols without disabling the ability to configure each one individually. The common SSL settings are at `spark.ssl` namespace in Spark configuration, while Akka SSL configuration is at `spark.ssl.akka` and HTTP for broadcast and file server SSL configuration is at `spark.ssl.fs`. The full breakdown can be found on the [configuration page](configuration.html).

SSL must be configured on each node and configured for each component involved in communication using the particular protocol.

### YARN mode
The key-store can be prepared on the client side and then distributed and used by the executors as the part of the application. It is possible because the user is able to deploy files before the application is started in YARN by using `spark.yarn.dist.files` or `spark.yarn.dist.archives` configuration settings. The responsibility for encryption of transferring these files is on YARN side and has nothing to do with Spark.

### Standalone mode
The user needs to provide key-stores and configuration options for master and workers. They have to be set by attaching appropriate Java system properties in `SPARK_MASTER_OPTS` and in `SPARK_WORKER_OPTS` environment variables, or just in `SPARK_DAEMON_JAVA_OPTS`. In this mode, the user may allow the executors to use the SSL settings inherited from the worker which spawned that executor. It can be accomplished by setting `spark.ssl.useNodeLocalConf` to `true`. If that parameter is set, the settings provided by user on the client side, are not used by the executors.

### Preparing the key-stores
Key-stores can be generated by `keytool` program. The reference documentation for this tool is
[here](https://docs.oracle.com/javase/7/docs/technotes/tools/solaris/keytool.html). The most basic
steps to configure the key-stores and the trust-store for the standalone deployment mode is as
follows:
* Generate a keys pair for each node
* Export the public key of the key pair to a file on each node
* Import all exported public keys into a single trust-store
* Distribute the trust-store over the nodes

## Configuring Ports for Network Security

Spark makes heavy use of the network, and some environments have strict requirements for using tight
Expand Down

0 comments on commit fb31b49

Please sign in to comment.