This repo contains the code for a data platform with 2 environments:
- dev: A development platform running on docker
- prd: A production platform running on aws + k8s
- Create + activate virtual env:
python3 -m venv .venv/dpenv
source .venv/dpenv/bin/activate
- Install + init
phidata
:
pip install phidata
phi init -l
If you encounter errors, try updating pip using
python -m pip install --upgrade pip
- Setup workspace:
phi ws setup
- Copy secrets:
cp -r workspace/example_secrets workspace/secrets
- Run dev platform on docker:
phi ws up dev:docker
If something fails, run again with debug logs:
phi ws up -d
Optional: Create .env
file:
cp example.env .env
The workspace/dev directory contains the code for the dev environment. Run it using:
phi ws up dev:docker
- Set
dev_airflow_enabled=True
in workspace/settings.py and runphi ws up dev:docker
- Check out the airflow webserver running in the
airflow-ws-container
:
- url:
http://localhost:8310/
- user:
admin
- pass:
admin
- Set
dev_jupyter_enabled=True
in workspace/settings.py and runphi ws up dev:docker
- Check out jupyterlab running in the
jupyter-container
:
- url:
http://localhost:8888/
- pass:
admin
Validate the workspace using: ./scripts/validate.sh
This will:
- Format using black
- Type check using mypy
- Test using pytest
- Lint using ruff
./scripts/validate.sh
If you need to install packages, run:
pip install black mypy pytest ruff
Install the workspace & python packages in the virtual env using:
./scripts/install.sh
This will:
- Install python packages from
requirements.txt
- Install the workspace in
--editable
mode
Following PEP-631, add dependencies to the pyproject.toml file.
To add a new package:
- Add the module to the pyproject.toml file.
- Run:
./scripts/upgrade.sh
to update therequirements.txt
file. - Run
phi ws up dev:docker -f
to recreate images + containers
Airflow requirements are stored in the workspace/dev/airflow/resources/requirements-airflow.txt file.
To add new airflow providers:
- Add the module to the workspace/dev/airflow/resources/requirements-airflow.txt file.
- Run
phi ws up -f --name airflow
to recreate images + containers
phi ws down
phi ws restart
The containers read env using the env_file
and secrets using the secrets_file
params which by default point to files in the workspace/env and workspace/secrets directories.
To add env variables to your airflow containers:
- Update the workspace/env/dev_airflow_env.yml file.
- Restart all airflow containers using:
phi ws restart dev:docker:airflow
To add secret variables to your airflow containers:
- Update the workspace/secrets/dev_airflow_secrets.yml file.
- Restart all airflow containers using:
phi ws restart dev:docker:airflow
# ssh into airflow-worker | airflow-ws
docker exec -it airflow-ws-container zsh
docker exec -it airflow-worker-container zsh
# Test run the DAGs using module name
python -m workflow.dir.file
# Test run the DAG file
python /mnt/workspaces/data-platform/workflow/dir/file.py
# List DAGs
airflow dags list
# List tasks in DAG
airflow tasks list \
-S /mnt/workspaces/data-platform/workflow/dir/file.py \
-t dag_name
# Test airflow task
airflow tasks test dag_name task_name 2022-07-01