In this repo, I expect to show you a couple of different ways to work with Spark out of a Hadoop Cluster. Kubernetes clusters are becoming more and more common in all sizes of companies, and use their power to process Spark is attractive. With this in mind, I'd like to invite you to join me on a journey of learning in a seek of more wide options to do Big Data.
You're about to face 3 ways of running Spark over containers:
- K8s Cluster (as your Spark Master)
- K8s Spark Operator (as a native module for Kubernetes)
- Docker Jupyter Pyspark (a really nice way to do plenty of adhocs and local experiences).
I deeply hope you to have fun with this experience and get yourself more confident to step outside of your traditional Hadoop cluster :)
Click HERE to follow the step-by-step :)
Mode | Status |
---|---|
K8s: Spark-Submit | OK |
GCP/spark-on-k8s-operator | OK (currently in Beta) |
Docker: Jupyter PySpark | OK |