Skip to content
This repository has been archived by the owner on Jan 28, 2022. It is now read-only.

Add samples #95

Merged
merged 1 commit into from
Oct 21, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@

> This project is experimental. Expect the API to change. It is not recommended for production environments.


## Introduction

Kubernetes offers the facility of extending it's API through the concept of 'Operators' ([Introducing Operators: Putting Operational Knowledge into Software](https://coreos.com/blog/introducing-operators.html)). This repository contains the resources and code to deploy an Azure Databricks Operator for Kubernetes.
Kubernetes offers the facility of extending its API through the concept of 'Operators' ([Introducing Operators: Putting Operational Knowledge into Software](https://coreos.com/blog/introducing-operators.html)). This repository contains the resources and code to deploy an Azure Databricks Operator for Kubernetes.

It is a Kubernetes controller that watches Customer Resource Definitions (CRDs) that define a Databricks job.

![alt text](docs/images/azure-databricks-operator.jpg "high level architecture")

Expand All @@ -24,6 +24,8 @@ The project was built using

For deployment guides please see [deploy.md](https://github.com/microsoft/azure-databricks-operator/blob/master/docs/deploy.md)

For samples and simple use cases on how to use the operator please see [samples.md](docs/samples.md)

## Roadmap

Check [roadmap.md](https://github.com/microsoft/azure-databricks-operator/blob/master/docs/roadmap.md) for what has been supported and what's coming.
Expand All @@ -34,6 +36,7 @@ Few topics are discussed in the [resources.md](https://github.com/microsoft/azur

- Kubernetes on WSL
- Build pipelines
- Dev container

## Contributing

Expand Down
Binary file added docs/images/copy-filepath-in-dbricks.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/create-cluster.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/databricks-job.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/direct-run.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/import-notebooks-databricks.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/run-periodic-job.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/secretscopes-runs.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
55 changes: 55 additions & 0 deletions docs/samples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Direct Run

## 1. Create a spark cluster and Run databricks notebook

[Direct run sample](samples/1_direct_run) shows how you can a spark cluster and run a databricks notebook.

1. Upload [basic1.ipynb](samples/1_direct_run/basic1.ipynb)


![alt text](images/import-notebooks-databricks.gif "import databricks notebook")

2. Update `notebook_path` in `samples/1_direct_run/run_basic1.yaml` file

![copy filepath in data bricks](images/copy-filepath-in-dbricks.jpg)

3. Apply `samples/1_direct_run/run_basic1.yaml`

![direct run](images/direct-run.jpg)

## 2. Create an interactive spark cluster and Run a databricks job on that cluster


[Databricks periodic job sample](samples/2_job_run) shows how you can create an interactive spark cluster in databricks and attach it to one or many databricks notebooks.

1. Apply `samples/2_job_run/cluster_interactive1.yaml` file

2. Update `existing_cluster_id` in `samples/2_job_run/run_basic1_periodic_on_existing_cluster.yaml` file

3. Apply `samples/2_job_run/run_basic1_periodic_on_existing_cluster.yaml`

![direct run](images/run-periodic-job.jpg)

## 3. Create Secret scopes, installing dependencies/libraries on spark cluster

[Databricks twitter ingest sample](samples/3_secret_scope) shows how you can create secret scopes in databricks, install libraries on your cluster, run a job that ingests data and acts as a message producer and sends the message to eventhub.

1. Upload [eventhub_ingest.ipynb](samples/3_secret_scope/eventhub_ingest.ipynb)
2. Upload [twitter_ingest.ipynb](samples/3_secret_scope/twitter_ingest.ipynb)
3. [Create eventhub namespace and eventhub in azure](https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-create)
4. Setup your twitter deveopler account
5. Replace `xxxxx` with correct values and create these two secrets

```
kubectl create secret generic twitter-secret --from-literal=TwitterAPIkey=xxxxx --from-literal=TwitterAPISecret=xxxxx --from-literal=TwitterAccessToken=xxxxx --from-literal=TwitterAccessSecret=xxxxx
```

```
kubectl create secret generic eventhubnamespace-secret --from-literal=EventhubNamespace=xxxxx --from-literal=SharedAccessKeyName=xxxxx --from-literal=SharedAccessKey=xxxxx --from-literal=ConnectionString=Endpoint=sb://xxxxx.servicebus.windows.net/;SharedAccessKeyName=xxxxx;SharedAccessKey=xxxxx
```

6. Apply `samples/3_secret_scope/secretscope_twitter.yaml`
7. Apply `samples/3_secret_scope/secretscope_eventhub.yaml`
8. Apply `samples/2_job_run/cluster_interactive1.yaml` file if you haven't already
2. Update `existing_cluster_id` in `samples/3_secret_scope/run_twitter1.yaml` file
8. Apply `samples/3_secret_scope/run_twitter1.yaml`
1 change: 1 addition & 0 deletions docs/samples/1_direct_run/basic1.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"cells":[{"cell_type":"code","source":["1+1"],"metadata":{},"outputs":[{"metadata":{},"output_type":"display_data","data":{"text/html":["<style scoped>\n .ansiout {\n display: block;\n unicode-bidi: embed;\n white-space: pre-wrap;\n word-wrap: break-word;\n word-break: break-all;\n font-family: \"Source Code Pro\", \"Menlo\", monospace;;\n font-size: 13px;\n color: #555;\n margin-left: 4px;\n line-height: 19px;\n }\n</style>\n<div class=\"ansiout\"><span class=\"ansired\">Out[</span><span class=\"ansired\">3</span><span class=\"ansired\">]: </span>2</div>"]}}],"execution_count":1},{"cell_type":"code","source":["name = dbutils.widgets.get(\"name\")\nprint(\"hello, {}!\".format(name)) "],"metadata":{},"outputs":[],"execution_count":2}],"metadata":{"name":"basic1","notebookId":4373330245447976},"nbformat":4,"nbformat_minor":0}
14 changes: 14 additions & 0 deletions docs/samples/1_direct_run/run_basic1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
apiVersion: databricks.microsoft.com/v1alpha1
kind: Run
metadata:
name: drun-basic1
spec:
# create a run directly without a job
new_cluster:
spark_version: 5.3.x-scala2.11
node_type_id: Standard_D3_v2
num_workers: 3
notebook_task:
base_parameters:
"name": "Azadeh"
notebook_path: "/samples/basic1"
18 changes: 18 additions & 0 deletions docs/samples/2_job_run/cluster_interactive1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
apiVersion: databricks.microsoft.com/v1alpha1
kind: Dcluster
metadata:
name: dcluster-interactive1
spec:
spark_version: latest-stable-scala2.11
node_type_id: Standard_D3_v2
autoscale:
min_workers: 1
max_workers: 2
driver_node_type_id: Standard_D3_v2
custom_tags:
- key: a
value: CustomTag1
spark_env_vars:
PYSPARK_PYTHON: /databricks/python3/bin/python3
enable_elastic_disk: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
apiVersion: databricks.microsoft.com/v1alpha1
kind: Djob
metadata:
name: djob-basic1
spec:
# This spec is directly linked to the JobSettings structure
# https://docs.databricks.com/api/latest/jobs.html#jobsettings
existing_cluster_id: 1021-013622-bused793
timeout_seconds: 3600
max_retries: 1
schedule:
quartz_cron_expression: 0 0/1 * * * ?
timezone_id: America/Los_Angeles
notebook_task:
base_parameters:
"name": "Azadeh"
notebook_path: "/samples/basic1"
1 change: 1 addition & 0 deletions docs/samples/3_secret_scope/eventhub_ingest.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"cells":[{"cell_type":"code","source":["dseventhubs=\"ds-eventhubs\"\nprint(dseventhubs) "],"metadata":{},"outputs":[{"metadata":{},"output_type":"display_data","data":{"text/html":["<style scoped>\n .ansiout {\n display: block;\n unicode-bidi: embed;\n white-space: pre-wrap;\n word-wrap: break-word;\n word-break: break-all;\n font-family: \"Source Code Pro\", \"Menlo\", monospace;;\n font-size: 13px;\n color: #555;\n margin-left: 4px;\n line-height: 19px;\n }\n</style>\n<div class=\"ansiout\">ds-eventhubs\n</div>"]}}],"execution_count":1},{"cell_type":"code","source":["EventhubNamespace=dbutils.secrets.get(scope=dseventhubs, key=\"EventhubNamespace\")\nEventhubName=dbutils.secrets.get(scope=dseventhubs, key=\"EventhubName\")\nSharedAccessKeyName=dbutils.secrets.get(scope=dseventhubs, key=\"SharedAccessKeyName\")\nSharedAccessKey=dbutils.secrets.get(scope=dseventhubs, key=\"SharedAccessKey\")"],"metadata":{},"outputs":[{"metadata":{},"output_type":"display_data","data":{"text/html":["<style scoped>\n .ansiout {\n display: block;\n unicode-bidi: embed;\n white-space: pre-wrap;\n word-wrap: break-word;\n word-break: break-all;\n font-family: \"Source Code Pro\", \"Menlo\", monospace;;\n font-size: 13px;\n color: #555;\n margin-left: 4px;\n line-height: 19px;\n }\n</style>\n<div class=\"ansiout\"></div>"]}}],"execution_count":2},{"cell_type":"code","source":["from azure.eventhub import EventHubClient, Receiver, Offset\nADDRESS = \"amqps://{}.servicebus.windows.net/{}\".format(EventhubNamespace,EventhubName)\n\n\n\nCONSUMER_GROUP = \"$default\"\nOFFSET = Offset(\"-1\")\nPARTITION = \"0\"\n\ntotal = 0\nlast_sn = -1\nlast_offset = \"-1\"\n# Create Event Hubs client\nclient = EventHubClient(ADDRESS, debug=False, username=SharedAccessKeyName, password=SharedAccessKey)\n\nreceiver = client.add_receiver(\nCONSUMER_GROUP, PARTITION, prefetch=5000, offset=OFFSET)\nclient.run()\n\n\n\n\n"],"metadata":{},"outputs":[{"metadata":{},"output_type":"display_data","data":{"text/html":["<style scoped>\n .ansiout {\n display: block;\n unicode-bidi: embed;\n white-space: pre-wrap;\n word-wrap: break-word;\n word-break: break-all;\n font-family: \"Source Code Pro\", \"Menlo\", monospace;;\n font-size: 13px;\n color: #555;\n margin-left: 4px;\n line-height: 19px;\n }\n</style>\n<div class=\"ansiout\"><span class=\"ansired\">Out[</span><span class=\"ansired\">3</span><span class=\"ansired\">]: </span>[]</div>"]}}],"execution_count":3},{"cell_type":"code","source":["batch = receiver.receive(timeout=1)\n\nwhile True:\n for event_data in batch:\n last_offset = event_data.offset\n last_sn = event_data.sequence_number\n #print(\"Msg offset: \" + str(last_offset))\n print(\"Msg seq: \" + str(last_sn))\n print(\"Msg body: \" + event_data.body_as_str(encoding='UTF-8'))\n total += 1\n batch = receiver.receive(timeout=1)"],"metadata":{},"outputs":[],"execution_count":4}],"metadata":{"name":"eventhub_ingest","notebookId":510540626668717},"nbformat":4,"nbformat_minor":0}
18 changes: 18 additions & 0 deletions docs/samples/3_secret_scope/run_twitter1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
apiVersion: databricks.microsoft.com/v1alpha1
kind: Run
metadata:
name: drun-twitteringest1
spec:
# create a run directly without a job
existing_cluster_id: 1021-013622-bused793
libraries:
- pypi:
package: "tweepy"
- pypi:
package: "azure.eventhub"
- maven:
coordinates: "com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.9" # installs the azure event hubs library
notebook_task:
base_parameters:
"filter": "#microsoft"
notebook_path: "/samples/twitter_ingest"
37 changes: 37 additions & 0 deletions docs/samples/3_secret_scope/secretscope_eventhub.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
apiVersion: databricks.microsoft.com/v1alpha1
kind: SecretScope
metadata:
name: ds-eventhubs
spec:
initial_manage_permission: users
secrets:
- key: ConnectionString
value_from:
secret_key_ref:
name: eventhubnamespace-secret
key: ConnectionString
- key: EventhubName
string_value: twitter
- key: EventhubNamespace
value_from:
secret_key_ref:
name: eventhubnamespace-secret
key: EventhubNamespace
- key: SharedAccessKey
value_from:
secret_key_ref:
name: eventhubnamespace-secret
key: SharedAccessKey
- key: SharedAccessKeyName
value_from:
secret_key_ref:
name: eventhubnamespace-secret
key: SharedAccessKeyName
- key: ConnectionString
value_from:
secret_key_ref:
name: eventhubnamespace-secret
key: ConnectionString
acls:
- principal: [email protected]
permission: READ
30 changes: 30 additions & 0 deletions docs/samples/3_secret_scope/secretscope_twitter.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
apiVersion: databricks.microsoft.com/v1alpha1
kind: SecretScope
metadata:
name: ds-twitters
spec:
initial_manage_permission: users
secrets:
- key: TwitterAPIkey
value_from:
secret_key_ref:
name: mytwittersecret
key: TwitterAPIkey
- key: TwitterAPISecret
value_from:
secret_key_ref:
name: mytwittersecret
key: TwitterAPISecret
- key: TwitterAccessToken
value_from:
secret_key_ref:
name: mytwittersecret
key: TwitterAccessToken
- key: TwitterAccessSecret
value_from:
secret_key_ref:
name: mytwittersecret
key: TwitterAccessSecret
acls:
- principal: [email protected]
permission: READ
1 change: 1 addition & 0 deletions docs/samples/3_secret_scope/twitter_ingest.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"cells":[{"cell_type":"code","source":["dseventhubs=\"ds-eventhubs\"\ndstwitters=\"ds-twitters\"\n\nprint(dseventhubs) \nprint(dstwitters) \n"],"metadata":{},"outputs":[{"metadata":{},"output_type":"display_data","data":{"text/html":["<style scoped>\n .ansiout {\n display: block;\n unicode-bidi: embed;\n white-space: pre-wrap;\n word-wrap: break-word;\n word-break: break-all;\n font-family: \"Source Code Pro\", \"Menlo\", monospace;;\n font-size: 13px;\n color: #555;\n margin-left: 4px;\n line-height: 19px;\n }\n</style>\n<div class=\"ansiout\">ds-eventhubs\nds-twitters\n</div>"]}}],"execution_count":1},{"cell_type":"code","source":["TwitterAPIkey=dbutils.secrets.get(scope=dstwitters, key=\"TwitterAPIkey\")\nTwitterAPISecret=dbutils.secrets.get(scope=dstwitters, key=\"TwitterAPISecret\")\nTwitterAccessToken=dbutils.secrets.get(scope=dstwitters, key=\"TwitterAccessToken\")\nTwitterAccessSecret=dbutils.secrets.get(scope=dstwitters, key=\"TwitterAccessSecret\")"],"metadata":{},"outputs":[{"metadata":{},"output_type":"display_data","data":{"text/html":["<style scoped>\n .ansiout {\n display: block;\n unicode-bidi: embed;\n white-space: pre-wrap;\n word-wrap: break-word;\n word-break: break-all;\n font-family: \"Source Code Pro\", \"Menlo\", monospace;;\n font-size: 13px;\n color: #555;\n margin-left: 4px;\n line-height: 19px;\n }\n</style>\n<div class=\"ansiout\"></div>"]}}],"execution_count":2},{"cell_type":"code","source":["EventhubNamespace=dbutils.secrets.get(scope=dseventhubs, key=\"EventhubNamespace\")\nEventhubName=dbutils.secrets.get(scope=dseventhubs, key=\"EventhubName\")\nSharedAccessKeyName=dbutils.secrets.get(scope=dseventhubs, key=\"SharedAccessKeyName\")\nSharedAccessKey=dbutils.secrets.get(scope=dseventhubs, key=\"SharedAccessKey\")"],"metadata":{},"outputs":[{"metadata":{},"output_type":"display_data","data":{"text/html":["<style scoped>\n .ansiout {\n display: block;\n unicode-bidi: embed;\n white-space: pre-wrap;\n word-wrap: break-word;\n word-break: break-all;\n font-family: \"Source Code Pro\", \"Menlo\", monospace;;\n font-size: 13px;\n color: #555;\n margin-left: 4px;\n line-height: 19px;\n }\n</style>\n<div class=\"ansiout\"></div>"]}}],"execution_count":3},{"cell_type":"code","source":["from azure.eventhub import EventHubClient, Sender, EventData\nADDRESS = \"amqps://{}.servicebus.windows.net/{}\".format(EventhubNamespace,EventhubName)\n\n\n# Create Event Hubs client\nclient = EventHubClient(ADDRESS, debug=False, username=SharedAccessKeyName, password=SharedAccessKey)\nsender = client.add_sender(partition=\"0\")\nclient.run()"],"metadata":{},"outputs":[{"metadata":{},"output_type":"display_data","data":{"text/html":["<style scoped>\n .ansiout {\n display: block;\n unicode-bidi: embed;\n white-space: pre-wrap;\n word-wrap: break-word;\n word-break: break-all;\n font-family: \"Source Code Pro\", \"Menlo\", monospace;;\n font-size: 13px;\n color: #555;\n margin-left: 4px;\n line-height: 19px;\n }\n</style>\n<div class=\"ansiout\"><span class=\"ansired\">Out[</span><span class=\"ansired\">7</span><span class=\"ansired\">]: </span>[]</div>"]}}],"execution_count":4},{"cell_type":"code","source":["import tweepy\n\nauth = tweepy.OAuthHandler(TwitterAPIkey, TwitterAPISecret)\nauth.set_access_token(TwitterAccessToken, TwitterAccessSecret)"],"metadata":{},"outputs":[{"metadata":{},"output_type":"display_data","data":{"text/html":["<style scoped>\n .ansiout {\n display: block;\n unicode-bidi: embed;\n white-space: pre-wrap;\n word-wrap: break-word;\n word-break: break-all;\n font-family: \"Source Code Pro\", \"Menlo\", monospace;;\n font-size: 13px;\n color: #555;\n margin-left: 4px;\n line-height: 19px;\n }\n</style>\n<div class=\"ansiout\"></div>"]}}],"execution_count":5},{"cell_type":"code","source":["class MyStreamListener(tweepy.StreamListener):\n \n def on_status(self, status):\n print(status.text)\n message = \"Message {}\".format(status.text)\n sender.send(EventData(message))\n\napi = tweepy.API(auth)\nmyStreamListener = MyStreamListener()\nmyStream = tweepy.Stream(auth = api.auth, listener=myStreamListener)"],"metadata":{},"outputs":[{"metadata":{},"output_type":"display_data","data":{"text/html":["<style scoped>\n .ansiout {\n display: block;\n unicode-bidi: embed;\n white-space: pre-wrap;\n word-wrap: break-word;\n word-break: break-all;\n font-family: \"Source Code Pro\", \"Menlo\", monospace;;\n font-size: 13px;\n color: #555;\n margin-left: 4px;\n line-height: 19px;\n }\n</style>\n<div class=\"ansiout\"></div>"]}}],"execution_count":6},{"cell_type":"code","source":["filter = dbutils.widgets.get(\"filter\")\nprint(filter) \n"],"metadata":{},"outputs":[],"execution_count":7},{"cell_type":"code","source":["myStream.filter(track=[filter])"],"metadata":{},"outputs":[],"execution_count":8}],"metadata":{"name":"twitter_ingest","notebookId":510540626668702},"nbformat":4,"nbformat_minor":0}