Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Dataproc Python Cloud Shell walkthrough tutorial #2733

Merged
merged 5 commits into from
Jan 23, 2020
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 15 additions & 9 deletions dataproc/python-api-walkthrough.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Use the Python Client Library to call Cloud Dataproc APIs
# Use the Python Client Library to call Dataproc APIs

Estimated completion time: <walkthrough-tutorial-duration duration="5"></walkthrough-tutorial-duration>

Expand All @@ -7,13 +7,13 @@ Estimated completion time: <walkthrough-tutorial-duration duration="5"></walkthr
This [Cloud Shell](https://cloud.google.com/shell/docs/) walkthrough leads you
through the steps to use the
[Google Cloud Client Libraries for Python](https://googleapis.github.io/google-cloud-python/latest/dataproc/index.html)
to programmatically interact with [Cloud Dataproc](https://cloud.google.com/dataproc/docs/).
to programmatically interact with [Dataproc](https://cloud.google.com/dataproc/docs/).

As you follow this walkthrough, you run Python code that calls
[Cloud Dataproc gRPC APIs](https://cloud.google.com/dataproc/docs/reference/rpc/)
[Dataproc gRPC APIs](https://cloud.google.com/dataproc/docs/reference/rpc/)
to:

* create a Cloud Dataproc cluster
* create a Dataproc cluster
* submit a small PySpark word sort job to run on the cluster
* get job status
* tear down the cluster after job completion
Expand All @@ -24,7 +24,7 @@ The `submit_job_to_cluster.py file` used in this walkthrough is opened in the
Cloud Shell editor when you launch the walkthrough. You can view
the code as your follow the walkthrough steps.

**For more information**: See [Cloud Dataproc&rarr;Use the Python Client Library](https://cloud.google.com/dataproc/docs/tutorials/python-library-example) for
**For more information**: See [Dataproc&rarr;Use the Python Client Library](https://cloud.google.com/dataproc/docs/tutorials/python-library-example) for
an explanation of how the code works.

**To reload this walkthrough:** Run the following command from the
Expand All @@ -41,8 +41,13 @@ an explanation of how the code works.
1. Create or select a Google Cloud Platform project to use for this tutorial.
* <walkthrough-project-billing-setup permissions=""></walkthrough-project-billing-setup>

1. Enable the Cloud Dataproc, Compute Engine, and Cloud Storage APIs in your project.
* <walkthrough-enable-apis apis="dataproc,compute_component,storage-component.googleapis.com"></walkthrough-enable-apis>
1. Click the link below to enable the Dataproc, Compute Engine, and Cloud Storage APIs
in a separate GCP console tab in your browser.

**Note:** After you select your project and enable the APIs, return to this tutorial by clicking
on the **Cloud Shell** tab in your browser.

* [Enable APIs](https://console.cloud.google.com/flows/enableapi?apiid=dataproc,compute_component,storage-component.googleapis.com&redirect=https://console.cloud.google.com)

## Prerequisites (2)

Expand Down Expand Up @@ -140,7 +145,8 @@ Job output in Cloud Shell shows cluster creation, job submission,
### Next Steps:

* **View job details from the Console.** View job details by selecting the
PySpark job from the Cloud Dataproc
PySpark job from the Dataproc
=
[Jobs page](https://console.cloud.google.com/dataproc/jobs)
in the Google Cloud Platform Console.

Expand All @@ -160,5 +166,5 @@ Job output in Cloud Shell shows cluster creation, job submission,
gsutil rm -r gs://$BUCKET
```

* **For more information.** See the [Cloud Dataproc documentation](https://cloud.google.com/dataproc/docs/)
* **For more information.** See the [Dataproc documentation](https://cloud.google.com/dataproc/docs/)
for API reference and product feature information.