Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run some example in Kubernetes execution mode in CI #1127

Merged
merged 64 commits into from
Aug 15, 2024
Merged

Conversation

pankajastro
Copy link
Contributor

@pankajastro pankajastro commented Jul 29, 2024

Description

Migrate example from cosmos-example

The cosmos-example repository currently contains several examples, including those that run in Kubernetes execution mode. This setup has made testing local changes in Kubernetes execution mode challenging and keeping the documentation up-to-date is also not easy. Therefore, it makes sense to migrate the Kubernetes examples from cosmos-example to this repository. This PR resolved the below issue in this regard

  • Migrate the jaffle_shop_kubernetes example DAG to the this repository.
  • Moved the Dockerfile from cosmos-example to this repository to build the image with the necessary DAGs and DBT projects
    I also adjusted both the example DAG and Dockerfile to work within this repository.

Automate running locally

I introduce some scripts to make running Kubernetes DAG easy.

postgres-deployment.yaml: Kubernetes resource file for spinning up PostgreSQL and creating Kubernetes secrets.

integration-kubernetes.sh: Runs the Kubernetes DAG using pytest.

kubernetes-setup.sh:

  • Builds the Docker image with the Jaffle Shop dbt project and DAG, and loads the Docker image into the local registry.
  • Creates Kubernetes resources such as PostgreSQL deployment, service, and secret.

Run DAG locally
Prerequisites:

  • Docker Desktop
  • KinD (Kubernetes in Docker)
  • kubectl

Steps:

  1. Create cluster: kind create cluster
  2. Create Resource: scripts/test/kubernetes-setup.sh (This will set up PostgreSQL and load the DBT project into the local registry)
  3. Run DAG: cd dev && scripts/test/integration-kubernetes.sh this will execute this DAG with a pytest you can also run directly with airflow command given that project is installed in your virtual env
time AIRFLOW__COSMOS__PROPAGATE_LOGS=0 AIRFLOW__COSMOS__ENABLE_CACHE=1 AIRFLOW__COSMOS__CACHE_DIR=/tmp/ AIRFLOW_CONN_EXAMPLE_
CONN="postgres://postgres:[email protected]:5432/postgres" PYTHONPATH=`pwd` AIRFLOW_HOME=`pwd` AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT=20000 AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT=20000 airflow dags test jaffle_shop_kubernetes  `date -Iseconds`

Run jaffle_shop_kubernetes in CI

To avoid regression we have automated running the jaffle_shop_kubernetes in CI

  • Set up the GitHub Actions infrastructure to run DAGs using Kubernetes execution mode
  • Use container-tools/kind-action@v1 to create a KinD cluster.
  • Used the bash script to streamline the creation of Kubernetes resources, build and load the image into a local registry, and execute tests.
  • At the moment I'm running the pytest from virtual env

Documentation changes

Given that the DAG jaffle_shop_kubernetes is now part of this repository, I have automated the example rendering for Kubernetes execution mode. This ensures that we avoid displaying outdated example code.

https://astronomer.github.io/astronomer-cosmos/getting_started/execution-modes.html#kubernetes
Screenshot 2024-08-15 at 8 03 59 PM

https://astronomer.github.io/astronomer-cosmos/getting_started/kubernetes.html#kubernetes

Screenshot 2024-08-15 at 8 04 22 PM

Future work

  • Use the hatch target to run the test. I have introduced the hatch target to run the Kubernetes example with hatch, but it's currently not working due to a mismatch between the local and container DBT project paths. This requires a bit more work.
  • Remove the virtual environment step (Install packages and dependencies) in the CI configuration for Run-Kubernetes-Tests and use hatch instead.
  • Update the profile YAML to use environment variables for the port, as it is currently hardcoded.
  • Remove the host from the Kubernetes secret and replace it with the username and make corresponding change in DAG
  • Currently, we need to export both POSTGRES_DATABASE and POSTGRES_DB in the Dockerfile because both are used in the project. To ensure consistency, avoid exporting both and instead make the environment variables consistent across the repository
  • Not a big deal in this context, but we have some hardcoded values for secrets. It would be better to parameterize them

GH issue for future improvement: #1160

Example CI Run

Related Issue(s)

closes: #535

Breaking Change?

No

Checklist

  • I have made corresponding changes to the documentation (if required)
  • I have added tests that prove my fix is effective or that my feature works

Copy link

netlify bot commented Jul 29, 2024

Deploy Preview for sunny-pastelito-5ecb04 canceled.

Name Link
🔨 Latest commit 05cedca
🔍 Latest deploy log https://app.netlify.com/sites/sunny-pastelito-5ecb04/deploys/66be350363b2af0007e7a6e4

@pankajastro pankajastro changed the title Kube mode ci @pankajastro Run some example in Kubernetes execution mode in CI Jul 29, 2024
@pankajastro pankajastro changed the title @pankajastro Run some example in Kubernetes execution mode in CI Run some example in Kubernetes execution mode in CI Jul 29, 2024
@pankajastro pankajastro changed the title Run some example in Kubernetes execution mode in CI [WIP] Run some example in Kubernetes execution mode in CI Jul 29, 2024
Copy link

codecov bot commented Jul 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.37%. Comparing base (a89389d) to head (05cedca).
Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1127   +/-   ##
=======================================
  Coverage   96.37%   96.37%           
=======================================
  Files          64       64           
  Lines        3424     3424           
=======================================
  Hits         3300     3300           
  Misses        124      124           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pankajastro pankajastro marked this pull request as ready for review August 15, 2024 15:18
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. area:ci Related to CI, Github Actions, or other continuous integration tools area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc execution:kubernetes Related to Kubernetes execution environment profile:postgres Related to Postgres ProfileConfig labels Aug 15, 2024
@tatiana tatiana modified the milestones: Cosmos 1.7.0, Cosmos 1.6.0 Aug 15, 2024
Copy link
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, @pankajastro , really happy this is automated, it will save us lots of time!

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Aug 15, 2024
@pankajastro pankajastro merged commit e1ff924 into main Aug 15, 2024
59 checks passed
@pankajastro pankajastro deleted the kube_mode_ci branch August 15, 2024 17:16
@pankajkoti pankajkoti mentioned this pull request Aug 16, 2024
pankajkoti added a commit that referenced this pull request Aug 20, 2024
New Features

* Add support for loading manifest from cloud stores using Airflow
Object Storage by @pankajkoti in #1109
* Cache ``package-lock.yml`` file by @pankajastro in #1086
* Support persisting the ``LoadMode.VIRTUALENV`` directory by @tatiana
in #1079
* Add support to store and fetch ``dbt ls`` cache in remote stores by
@pankajkoti in #1147
* Add default source nodes rendering by @arojasb3 in #1107
* Add Teradata ``ProfileMapping`` by @sc250072 in #1077

Enhancements

* Add ``DatabricksOauthProfileMapping`` profile by @CorsettiS in #1091
* Use ``dbt ls`` as the default parser when ``profile_config`` is
provided by @pankajastro in #1101
* Add task owner to dbt operators by @wornjs in #1082
* Extend Cosmos custom selector to support + when using paths and tags
by @mvictoria in #1150
* Simplify logging by @dwreeves in #1108

Bug fixes

* Fix Teradata ``ProfileMapping`` target invalid issue by @sc250072 in
#1088
* Fix empty tag in case of custom parser by @pankajastro in #1100
* Fix ``dbt deps`` of ``LoadMode.DBT_LS`` should use
``ProjectConfig.dbt_vars`` by @tatiana in #1114
* Fix import handling by lazy loading hooks introduced in PR #1109 by
@dwreeves in #1132
* Fix Airflow 2.10 regression and add Airflow 2.10 in test matrix by
@pankajastro in #1162

Docs

* Fix typo in azure-container-instance docs by @pankajastro in #1106
* Use Airflow trademark as it has been registered by @pankajastro in
#1105

Others

* Run some example DAGs in Kubernetes execution mode in CI by
@pankajastro in #1127
* Install requirements.txt by default during dev env spin up by
@@CorsettiS in #1099
* Remove ``DbtGraph.current_version`` dead code by @tatiana in #1111
* Disable test for Airflow-2.5 and Python-3.11 combination in CI by
@pankajastro in #1124
* Pre-commit hook updates in #1074, #1113, #1125, #1144, #1154,  #1167

---------

Co-authored-by: Pankaj Koti <[email protected]>
Co-authored-by: Pankaj Singh <[email protected]>
@tatiana tatiana modified the milestones: Cosmos 1.6.0, Cosmos 1.6.1 Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:ci Related to CI, Github Actions, or other continuous integration tools area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc execution:kubernetes Related to Kubernetes execution environment lgtm This PR has been approved by a maintainer profile:postgres Related to Postgres ProfileConfig size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create ExecutionMode.KUBERNETES example DAG & setup CI
2 participants