Skip to content

trivenigk/spark-training

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark Training Repository

This repository contains many different examples, exercises and tutorials for Spark and Hadoop trainings performed by dimajix. You can always find the latest version on GitHub at

https://github.com/dimajix/spark-training

Contents

The repository contains different types of documents

  • Source Code for Spark/Scala
  • Jupyter Notebooks for PySpark
  • Zeppelin Notebooks for Spark/Scala
  • Hive SQL scripts
  • Pig scripts
  • ...and much more

External Dependencies

Some notebooks require some test data provided by dimajix on S3 at s3://dimajix-training/data/.

Building Executables

The source code can be built using Maven, simply by running

mvn install

from the root directory.

Running Examples

Most code is either provided as interactive Notebooks (Jupyter and/or Zeppelin) or as compilable programs. Programs which create jar files always contain start scripts, which take care of setting any environment variables and Spark configuration properties.

About

Repository used for Spark Trainings

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 97.5%
  • Python 0.7%
  • Shell 0.6%
  • Scala 0.5%
  • TSQL 0.3%
  • Java 0.2%
  • Other 0.2%