This repository contains code and documentation for use with Google Cloud Dataproc.
codelabs/opencv-haarcascade
provides the source code for the OpenCV Dataproc Codelab, which demonstrates a Spark job that adds facial detection to a set of images.spark-tensorflow
provides an example of using Spark as a preprocessing toolchain for Tensorflow jobs. Optionally, it demonstrates the spark-tensorflow-connector to convert CSV files to TFRecords.spark-translate
provides a simple demo Spark application that translates words using Google's Translation API and running on Cloud Dataproc.
See each directories README for more information.
You can find more Dataproc resources in these github repositories:
- Dataproc initialization actions
- Hadoop/Spark GCS Connector
- Spark BigQuery Connector
- Hadoop BigQuery Connector
- Spark Pubsub Connector
- Spark Spanner Connector
- GCP Token Broker
- Dataproc Python examples
- Dataproc Java Bigtable sample
- Dataproc Spark-Bigtable samples
For more information, review the Dataproc
documentation. You can also
pose questions to the Stack
Overflow community
with the tag google-cloud-dataproc
.
See our other Google Cloud Platform github
repos for sample applications and
scaffolding for other frameworks and use cases.
- See CONTRIBUTING.md
- See LICENSE