This section provides examples of how to author spark tasks and workflows using FlyteKit as well as additional setup required to run Spark Jobs via Flyte.
-
For Spark, the image must contain spark dependencies as well as the correct entrypoint for the Spark driver/executors. This can be achieved by using the flytekit_install_spark.sh script provided as referenced in the Dockerfile included here.
-
In-addition, Flyte uses the SparkOperator to run Spark Jobs as well as separate
K8s Service Account/Role
per namespace. All of these are created as part of the standard Flyte deploy. Please refer to Getting Started guidefor more details on how to deploy Flyte. -
Based on the resources required for your spark job (across driver/executors), you might have to tweak
resourcequotas
for the namespace.
Flyte supports both python
and scala/java
spark tasks: