spark-on-k8s

In this repo, I expect to show you a couple of different ways to work with Spark out of a Hadoop Cluster. Kubernetes clusters are becoming more and more common in all sizes of companies, and use their power to process Spark is attractive. With this in mind, I'd like to invite you to join me on a journey of learning in a seek of more wide options to do Big Data.

You're about to face 3 ways of running Spark over containers:

K8s Cluster (as your Spark Master)
K8s Spark Operator (as a native module for Kubernetes)
Docker Jupyter Pyspark (a really nice way to do plenty of adhocs and local experiences).

I deeply hope you to have fun with this experience and get yourself more confident to step outside of your traditional Hadoop cluster :)

How to set everything up

Click HERE to follow the step-by-step :)

So far...

Mode	Status
K8s: Spark-Submit	OK
GCP/spark-on-k8s-operator	OK (currently in Beta)
Docker: Jupyter PySpark	OK

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
devops		devops
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spark-on-k8s

How to set everything up

So far...

Architecture

About

Releases

Packages

Contributors 2

Languages

lmassaoy/spark-on-k8s

Folders and files

Latest commit

History

Repository files navigation

spark-on-k8s

How to set everything up

So far...

Architecture

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages