Skip to content

arnehuang/cloud-dataproc

 
 

Repository files navigation

Google Cloud Dataproc

This repository contains code and documentation for use with Google Cloud Dataproc.

Samples in this Repository

  • codelabs/opencv-haarcascade provides the source code for the OpenCV Dataproc Codelab, which demonstrates a Spark job that adds facial detection to a set of images.
  • spark-tensorflow provides an example of using Spark as a preprocessing toolchain for Tensorflow jobs. Optionally, it demonstrates the spark-tensorflow-connector to convert CSV files to TFRecords.
  • spark-translate provides a simple demo Spark application that translates words using Google's Translation API and running on Cloud Dataproc.

See each directories README for more information.

Additional Dataproc Repositories

You can find more Dataproc resources in these github repositories:

For more information

For more information, review the Dataproc documentation. You can also pose questions to the Stack Overflow community with the tag google-cloud-dataproc. See our other Google Cloud Platform github repos for sample applications and scaffolding for other frameworks and use cases.

Contributing changes

Licensing

About

Samples for Cloud Dataproc

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 53.9%
  • Scala 34.5%
  • Shell 11.5%
  • Dockerfile 0.1%