Read and Write Data in Azure Databricks

Databricks is a platform to work with big data. Azure has a solid integration with Databricks, making it easy to setup and start working with datasets.

Module Sources

Read and write data in Azure Databricks

Goals

In this workshop, you'll use Databricks, a powerful platform to work with big data. We'll go over reading and writing different data types in Azure Databricks like JSON, Parquet, and CSV. You'll also learn how to read and operate on stored data in Databricks.

Goal	Description
What will you learn	Read and write different types of data in Azure Databricks
What you'll need	An Azure subscription: Free Azure Trial Azure For Students
Duration	1 hour
Slides	Powerpoint

Video

🎥 Click the image above to learn how to deliver this workshop

Pre-Learning

Describe Azure Databricks

Prerequisites

Attendees should have an Azure account. There are several options that can give you free credits:

Free Azure Trial
Azure For Students

For this workshop, students will go through the Read and write data in Azure Databricks Learn Module which will direct you to import Jupyter Notebooks to execute processes using Databricks. These notebooks will show you how to read and write data in Azure Databricks.

What students will learn

Databricks is one of the most prominent platforms to deal with big data and perform collaborative tasks in Data Science. In this workshop, you will cover how to get started with the platform in Azure and perform data interactions including reading, writing, and analyzing datasets.

Read data:

Work with notebooks:

Import notebooks:

Attach a cluster to a workplace:

Milestone 1: Read data in CSV format

Start here to create an Azure Databricks cluster and go through the notebook to read data. This notebook will cover the following:

Cover the SparkSession class
Read a very simple TSV (tab-separated value) file
Transform the TSV into a CSV (comma-separated value) file
Read the CSV file with an inferred schema and then with a user-defined schema

Milestone 2: Read data in JSON format

Continue with the next notebook to read JSON data after completing the CSV notebook. Go through the steps required to read and load the JSON data into Databricks. In this section you will:

Load a JSON file into Databricks
Use inferring to auto detect the types of values and data in the JSON file
With pyspark, you'll create a user-defined schema to load JSON

Milestone 3: Read data in Parquet format

Work through the next notebook (3.Reading Data - Parquet) to load Parquet files that usually come with a pre-defined schema. Similar to other milestones, in this section you will:

Load a Parquet file with a pre-defined schema
Understand why Parquet files usually come with a schema
Read data from the Parquet file and visualize some of its details

Milestone 4: Read data stored in tables and views

Go to the next notebook (4.Reading Data - Tables and Views) in the Databricks notebook to read data stored in tables and views. In this section you will see how to register tables and read from a table or view, these will include:

Register a table into Databricks
Use a TSV file to create a new table
Visualize the loaded data in the UI
Create a temporary view

Milestone 5: Write data

Finally, go to the last notebook (5.Writing Data) where you will use Parquet files to write data. You'll cover loading a TSV file into Databricks and then save it as a Parquet file using the Pyspark Python API.

Quiz

Verify your knowledge with a short quiz

Next steps

There is a slightly more advanced and involved learning path that covers Data Engineering with Azure Databricks.

Practice

In this workshop you used small data sets that are easy to work with. Try using larger datasets to take advantage of a platform like Databricks.

Feedback

Be sure to give feedback about this workshop!

Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Read and Write Data in Azure Databricks

Module Sources

Goals

Video

Pre-Learning

Prerequisites

What students will learn

Milestone 1: Read data in CSV format

Milestone 2: Read data in JSON format

Milestone 3: Read data in Parquet format

Milestone 4: Read data stored in tables and views

Milestone 5: Write data

Quiz

Next steps

Practice

Feedback

Files

README.md

Latest commit

History

README.md

File metadata and controls

Read and Write Data in Azure Databricks

Module Sources

Goals

Video

Pre-Learning

Prerequisites

What students will learn

Milestone 1: Read data in CSV format

Milestone 2: Read data in JSON format

Milestone 3: Read data in Parquet format

Milestone 4: Read data stored in tables and views

Milestone 5: Write data

Quiz

Next steps

Practice

Feedback