Spark Kernel

A simple Scala application to connect to a Spark cluster and provide a generic, robust API to tap into various Spark APIs. Furthermore, this project intends to provide the ability to send both packaged jars (standard jobs) and code snippets (with revision capability) for scenarios like IPython for dynamic updates. Finally, the kernel is written with the future plan to allow multiple applications to connect to a single kernel to take advantage of the same Spark context.

Vagrant Dev Environment

A Vagrantfile is provided to easily setup a development environment. You will need to install Virtualbox 4.3.12+ and Vagrant 1.6.2+. Once Vagrant and Virtualbox are installed, the environment can be created by running vagrant up. When the VM is running, you can get to the source code by executing:

vagrant ssh
cd /src/spark-kernel

Building from Source

To build the kernel from source, you need to have sbt installed on your machine. Once it is on your path, you can compile the Spark Kernel by running the following from the root of the Spark Kernel directory:

sbt compile

The recommended configuration options for sbt are as follows:

-Xms1024M
-Xmx2048M
-Xss1M
-XX:+CMSClassUnloadingEnabled
-XX:MaxPermSize=1024M

Library Dependencies

The Spark Kernel uses ZeroMQ as the medium for communication. In order for the Spark Kernel to be able to use this protocol, the ZeroMQ library needs to be installed on the system where the kernel will be run. The Vagrant and Docker environments set this up for you.

For Mac OS X, you can use Homebrew to install the library:

brew install zeromq22

For aptitude-based systems (such as Ubuntu 14.04), you can install via apt-get:

apt-get install libzmq-dev

Usage Instructions

The Spark Kernel is provided as a series of jars, which can be executed on their own or as part of the launch process of an IPython notebook.

The following command line options are available:

--profile - the file to load containing the ZeroMQ port information
--help - displays the help menu detailing usage instructions
--master - location of the Spark master (defaults to local[*])

Additionally, ZeroMQ configurations can be passed as command line arguments

--ip
--stdin-port
--shell-port
--iopub-port
--control-port
--heartbeat-port

Ports can also be specified as Environment variables:

IP
STDIN_PORT
SHELL_PORT
IOPUB_PORT
CONTROL_PORT
HB_PORT

Packing the kernel

We are utilizing xerial/sbt-pack to package and distribute the Spark Kernel.

You can run the following to package the kernel, which creates a directory in kernel/target/pack contains a Makefile that can be used to install the kernel on the current machine:

sbt kernel/pack

To install the kernel, run the following:

cd kernel/target/pack
make install

This will place the necessary jars in ~/local/kernel/current/lib and provides a convient script to start the kernel located at ~/local/kernel/current/bin/sparkkernel.

Running A Docker Container

The Spark Kernel can be run in a docker container using Docker 1.0.0+. There is a Dockerfile included in the root of the project. You will need to compile and pack the Spark Kernel before the docker image can be built.

sbt compile
sbt pack
docker build -t spark-kernel .

After the image has been successfully created, you can run your container by executing the command:

docker run -d -e IP=0.0.0.0 spark-kernel

You must always include -e IP=0.0.0.0 to allow the kernel to bind to the docker container's IP. The environment variables listed in the getting started section can be used in the docker run command. This allows you to explicitly set the ports for the kernel.

Development Instructions

You must have SBT 0.13.5+ installed. From the command line, you can attempt to run the project by executing sbt kernel/run <args> from the root directory of the project. You can run all tests using sbt test (see instructions below for more testing details).

For IntelliJ developers, you can attempt to create an IntelliJ project structure using sbt gen-idea.

Running tests

There are four levels of test in this project:

Unit - tests that isolate a specific class/object/etc for its functionality
Integration - tests that illustrate functionality between multiple components
System - tests that demonstrate correctness across the entire system
Scratch - tests isolated in a local branch, used for quick sanity checks, not for actual inclusion into testing solution

To execute specific tests, run sbt with the following:

Unit - sbt unit:test
Integration - sbt integration:test
System - sbt system:test
Scratch - sbt scratch:test

To run all tests, use sbt test!

The naming convention for tests is as follows:

Unit - test classes end with Spec e.g. CompleteRequestSpec
- Placed under com.ibm.spark
Integration - test classes end with SpecForIntegration e.g. InterpreterWithActorSpecForIntegration
- Placed under integration
System - test classes end with SpecForSystem e.g. InputToAddJarSpecForSystem
- Placed under system
Scratch
- Placed under scratch

It is also possible to run tests for a specific project by using the following syntax in sbt:

sbt <PROJECT>/test

Name		Name	Last commit message	Last commit date
Latest commit History 440 Commits
client		client
docs		docs
kernel-api		kernel-api
kernel		kernel
project		project
protocol		protocol
resources		resources
.bash_profile		.bash_profile
.gitignore		.gitignore
.pairs		.pairs
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
Vagrantfile		Vagrantfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark Kernel

Vagrant Dev Environment

Building from Source

Library Dependencies

Usage Instructions

Packing the kernel

Running A Docker Container

Development Instructions

Running tests

About

Releases

Packages

Languages

License

josephwinston/spark-kernel

Folders and files

Latest commit

History

Repository files navigation

Spark Kernel

Vagrant Dev Environment

Building from Source

Library Dependencies

Usage Instructions

Packing the kernel

Running A Docker Container

Development Instructions

Running tests

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages