Materials for the "Building a modern data platform with Python and open-source tools" workshop.
- Intro + set up - 1hr
- Intro to Prefect - 1.5hr
- Break - 15min
- Airbyte - existing source - 1hr
- Airbyte - new source 1hr (bonus)
- Wrap up - 15min
We'll be using GitHub, Docker, Prefect Cloud, and Airbyte during the workshop. To save everyone's time, please make sure you have at minimum Git + Docker with Airbyte images set up before we begin (as it needs to download quite a lot of data).
Great for many things, but not yet for local data engineering development.
Run wsl --install
from an administrator Powershell or CMD to install it (you may need to install it manually on some Windows versions). This method will get the image and set the WSL version to two. You may still need to enable Docker integration as per below instruction (bullet point 3).
If you've already installed WSL when installing Docker, you have to:
- download and install the
Ubuntu
image from Microsoft store - set the WSL version to 2:
wsl --set-version Ubuntu 2
- enable DOcker in this Ubuntu image by checking the box next to
Ubuntu
inDocker -> Settings -> Resources -> WSL integration
I recommend using Windows Terminal to run all commands in WSL. You can find my settings here. Once set up, you can open WSL shell like this:
Once in the shell, type sudo apt update && sudo apt install python3-pip
.
- create a Personal Access Token: LINK
Select "Repo" as the scope. - make GitHub remember the credentials:
git config --global credential.helper store
- run
docker login
- Docker will ask you to provide your DockerHub username and password.
Provide your username and access token, which you can generate inhub.docker.com -> fingerprint icon -> Account Settings -> Security -> New Access Token
.
Make sure to save that token for step 4 ofWorkshop set up
.
NOTE Linux-only (it's built into Docker on other systems)
sudo curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose
Run the following commands:
git clone https://github.com/airbytehq/airbyte.git
cd airbyte
docker-compose up
Once you see an Airbyte banner, the UI is ready at localhost:8000
.
This concludes the set up. Hit CTRL
+C
to spin down Airbyte.
- go to https://cloud.prefect.io
- choose the Free plan. If not possible, register for the "Starter" plan
- Note: For Starter plan, you will have to provide your credit card details (they have 20,000 task runs/month free tier, which for personal and educational use is basically infinite. It's also very easy to delete the account after the workshop.)
- create an API key:
- click on the face logo in top right corner, then Account Settings -> API Keys
- click "CREATE AN API KEY"
- choose a name, eg "dyvenia_elt_workshop"
- choose an expiration date (for us a month is enough)
- click "CREATE"
- save it for now; we'll use it in step 3
- download: LINK
- install extensions:
code --install-extension ms-python.python && \
code --install-extension ms-vscode-remote.vscode-remote-extensionpack && \
code --install-extension MS-vsliveshare.vsliveshare-pack && \
code --install-extension njpwerner.autodocstring
- pull the ELT workshop repo (provide your personal access token as password if required):
git clone https://github.com/dyvenia/elt_workshop.git
- cretae a
.env
file in thedocker
folder (you can also just remove the.EXAMPLE
from the example.env
file) - provide values for the three variables at the top (
DOCKERHUB_USER
,DOCKERHUB_TOKEN
,PREFECT_API_KEY
)
- in the
elt_workshop
folder, runsh scripts/setup.sh