Skip to content

Latest commit

 

History

History
 
 

pyspark

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

The pyspark Kedro starter

Introduction

The code in this repository demonstrates best practice when working with Kedro and PySpark. It contains a Kedro starter template with some initial configuration and an example pipeline, and originates from the Kedro documentation about how to work with PySpark.

Features

Single configuration in /conf/base/spark.yml

While Spark allows you to specify many different configuration options, this starter uses /conf/base/spark.yml as a single configuration location.

SparkSession initialisation

This Kedro starter contains the initialisation code for SparkSession in the ProjectContext and takes its configuration from /conf/base/spark.yml. Modify this code if you want to further customise your SparkSession, e.g. to use YARN.