Skip to content

Commit

Permalink
Add performance integration tests (astronomer#827)
Browse files Browse the repository at this point in the history
## Description

This PR adds a step to our CI to measure how quickly Cosmos can run
models. This is part of a larger initiative to make the project more
performant now that it's reached a certain level of maturity.

How it works:
- We now have [a test that generates a dbt project with a certain number
of sequential
models](https://github.com/astronomer/astronomer-cosmos/blob/performance-int-tests/tests/perf/test_performance.py)
(based on a parameter that gets passed in), runs a simple DAG, and
measures task throughput (measured in terms of models run per second
- I've extended our CI to run this test for 1, 10, 50, and 100 models to
start
- This CI reports out a GitHub Actions output that gets shown in the
actions summary, [at the
bottom](https://github.com/astronomer/astronomer-cosmos/actions/runs/7894490582)

While this isn't perfect, it's a step in the right direction - we now
have some general visibility! Note that these numbers may not be
indicative of a production Airflow environment running something like
the Kubernetes Executor, because this runs a local executor on GH
Actions runners. Still, it's meant as a benchmark to demonstrate whether
we're moving in the right direction or not.

As part of this, I've also refactored our tests to call a script from
the pyproject file instead of embedding the scripts directly in the
file. This should make it easier to maintain and track changes.

<!-- Add a brief but complete description of the change. -->

## Related Issue(s)

<!-- If this PR closes an issue, you can use a keyword to auto-close.
-->
<!-- i.e. "closes #0000" -->
astronomer#800

## Breaking Change?

<!-- If this introduces a breaking change, specify that here. -->

## Checklist

- [ ] I have made corresponding changes to the documentation (if
required)
- [ ] I have added tests that prove my fix is effective or that my
feature works
  • Loading branch information
jlaneve authored and arojasb3 committed Jul 14, 2024
1 parent 3c98fff commit 9fe2f1d
Show file tree
Hide file tree
Showing 24 changed files with 351 additions and 104 deletions.
59 changes: 53 additions & 6 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,8 @@ concurrency:
cancel-in-progress: true

jobs:

Authorize:
environment:
${{ github.event_name == 'pull_request_target' &&
environment: ${{ github.event_name == 'pull_request_target' &&
github.event.pull_request.head.repo.full_name != github.repository &&
'external' || 'internal' }}
runs-on: ubuntu-latest
Expand All @@ -30,8 +28,8 @@ jobs:

- uses: actions/setup-python@v3
with:
python-version: '3.9'
architecture: 'x64'
python-version: "3.9"
architecture: "x64"

- run: pip3 install hatch
- run: hatch run tests.py3.9-2.7:type-check
Expand Down Expand Up @@ -294,6 +292,55 @@ jobs:
AIRFLOW_CONN_AIRFLOW_DB: postgres://postgres:[email protected]:5432/postgres
PYTHONPATH: /home/runner/work/astronomer-cosmos/astronomer-cosmos/:$PYTHONPATH

Run-Performance-Tests:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.11"]
airflow-version: ["2.7"]
num-models: [1, 10, 50, 100]

steps:
- uses: actions/checkout@v3
with:
ref: ${{ github.event.pull_request.head.sha || github.ref }}
- uses: actions/cache@v3
with:
path: |
~/.cache/pip
.nox
key: perf-test-${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.airflow-version }}-${{ hashFiles('pyproject.toml') }}-${{ hashFiles('cosmos/__init__.py') }}

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}

- name: Install packages and dependencies
run: |
python -m pip install hatch
hatch -e tests.py${{ matrix.python-version }}-${{ matrix.airflow-version }} run pip freeze
- name: Run performance tests against against Airflow ${{ matrix.airflow-version }} and Python ${{ matrix.python-version }}
id: run-performance-tests
run: |
hatch run tests.py${{ matrix.python-version }}-${{ matrix.airflow-version }}:test-performance-setup
hatch run tests.py${{ matrix.python-version }}-${{ matrix.airflow-version }}:test-performance
# read the performance results and set them as an env var for the next step
# format: NUM_MODELS={num_models}\nTIME={end - start}\n
cat /tmp/performance_results.txt > $GITHUB_STEP_SUMMARY
env:
AIRFLOW_HOME: /home/runner/work/astronomer-cosmos/astronomer-cosmos/
AIRFLOW_CONN_AIRFLOW_DB: postgres://postgres:[email protected]:5432/postgres
AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT: 90.0
PYTHONPATH: /home/runner/work/astronomer-cosmos/astronomer-cosmos/:$PYTHONPATH
MODEL_COUNT: ${{ matrix.num-models }}

env:
AIRFLOW_HOME: /home/runner/work/astronomer-cosmos/astronomer-cosmos/
AIRFLOW_CONN_AIRFLOW_DB: postgres://postgres:[email protected]:5432/postgres
PYTHONPATH: /home/runner/work/astronomer-cosmos/astronomer-cosmos/:$PYTHONPATH

Code-Coverage:
if: github.event.action != 'labeled'
Expand All @@ -309,7 +356,7 @@ jobs:
- name: Set up Python 3.11
uses: actions/setup-python@v3
with:
python-version: '3.11'
python-version: "3.11"
- name: Install coverage
run: |
pip3 install coverage
Expand Down
4 changes: 4 additions & 0 deletions dev/dags/dbt/perf/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@

target/
dbt_packages/
logs/
3 changes: 3 additions & 0 deletions dev/dags/dbt/perf/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
dbt project for running performance tests.

The `models` directory gets populated by an integration test defined in `tests/perf`.
Empty file.
17 changes: 17 additions & 0 deletions dev/dags/dbt/perf/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: "perf"
version: "1.0.0"
config-version: 2

model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets: # directories to be removed by `dbt clean`
- "target"
- "dbt_packages"
Empty file.
11 changes: 11 additions & 0 deletions dev/dags/dbt/perf/profiles.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
simple:
target: dev
outputs:
dev:
type: sqlite
threads: 1
database: "database"
schema: "main"
schemas_and_paths:
main: "{{ env_var('DBT_SQLITE_PATH') }}/imdb.db"
schema_directory: "{{ env_var('DBT_SQLITE_PATH') }}"
Empty file.
Empty file.
Empty file.
36 changes: 36 additions & 0 deletions dev/dags/performance_dag.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
"""
A DAG that uses Cosmos to render a dbt project for performance testing.
"""

import airflow
from datetime import datetime
import os
from pathlib import Path

from cosmos import DbtDag, ProjectConfig, ProfileConfig, RenderConfig

DEFAULT_DBT_ROOT_PATH = Path(__file__).parent / "dbt"
DBT_ROOT_PATH = Path(os.getenv("DBT_ROOT_PATH", DEFAULT_DBT_ROOT_PATH))
DBT_SQLITE_PATH = str(DEFAULT_DBT_ROOT_PATH / "data")

profile_config = ProfileConfig(
profile_name="simple",
target_name="dev",
profiles_yml_filepath=(DBT_ROOT_PATH / "simple/profiles.yml"),
)

cosmos_perf_dag = DbtDag(
project_config=ProjectConfig(
DBT_ROOT_PATH / "perf",
env_vars={"DBT_SQLITE_PATH": DBT_SQLITE_PATH},
),
profile_config=profile_config,
render_config=RenderConfig(
dbt_deps=False,
),
# normal dag parameters
schedule_interval=None,
start_date=datetime(2024, 1, 1),
catchup=False,
dag_id="performance_dag",
)
122 changes: 28 additions & 94 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,8 @@ description = "Orchestrate your dbt projects in Airflow"
readme = "README.rst"
license = "Apache-2.0"
requires-python = ">=3.8"
authors = [
{ name = "Astronomer", email = "[email protected]" },
]
keywords = [
"airflow",
"apache-airflow",
"astronomer",
"dags",
"dbt",
]
authors = [{ name = "Astronomer", email = "[email protected]" }]
keywords = ["airflow", "apache-airflow", "astronomer", "dags", "dbt"]
classifiers = [
"Development Status :: 3 - Alpha",
"Environment :: Web Environment",
Expand Down Expand Up @@ -56,48 +48,23 @@ dbt-all = [
"dbt-spark",
"dbt-vertica",
]
dbt-athena = [
"dbt-athena-community",
"apache-airflow-providers-amazon>=8.0.0",
]
dbt-bigquery = [
"dbt-bigquery",
]
dbt-databricks = [
"dbt-databricks",
]
dbt-exasol = [
"dbt-exasol",
]
dbt-postgres = [
"dbt-postgres",
]
dbt-redshift = [
"dbt-redshift",
]
dbt-snowflake = [
"dbt-snowflake",
]
dbt-spark = [
"dbt-spark",
]
dbt-vertica = [
"dbt-vertica<=1.5.4",
]
openlineage = [
"openlineage-integration-common",
"openlineage-airflow",
]
all = [
"astronomer-cosmos[dbt-all]",
"astronomer-cosmos[openlineage]"
]
docs =[
dbt-athena = ["dbt-athena-community", "apache-airflow-providers-amazon>=8.0.0"]
dbt-bigquery = ["dbt-bigquery"]
dbt-databricks = ["dbt-databricks"]
dbt-exasol = ["dbt-exasol"]
dbt-postgres = ["dbt-postgres"]
dbt-redshift = ["dbt-redshift"]
dbt-snowflake = ["dbt-snowflake"]
dbt-spark = ["dbt-spark"]
dbt-vertica = ["dbt-vertica<=1.5.4"]
openlineage = ["openlineage-integration-common", "openlineage-airflow"]
all = ["astronomer-cosmos[dbt-all]", "astronomer-cosmos[openlineage]"]
docs = [
"sphinx",
"pydata-sphinx-theme",
"sphinx-autobuild",
"sphinx-autoapi",
"apache-airflow-providers-cncf-kubernetes>=5.1.1"
"apache-airflow-providers-cncf-kubernetes>=5.1.1",
]
tests = [
"packaging",
Expand Down Expand Up @@ -137,9 +104,7 @@ Documentation = "https://astronomer.github.io/astronomer-cosmos"
path = "cosmos/__init__.py"

[tool.hatch.build.targets.sdist]
include = [
"/cosmos",
]
include = ["/cosmos"]

[tool.hatch.build.targets.wheel]
packages = ["cosmos"]
Expand Down Expand Up @@ -175,51 +140,20 @@ matrix.airflow.dependencies = [
[tool.hatch.envs.tests.scripts]
freeze = "pip freeze"
type-check = "mypy cosmos"
test = 'pytest -vv --durations=0 . -m "not integration" --ignore=tests/test_example_dags.py --ignore=tests/test_example_dags_no_connections.py'
test-cov = """pytest -vv --cov=cosmos --cov-report=term-missing --cov-report=xml --durations=0 -m "not integration" --ignore=tests/test_example_dags.py --ignore=tests/test_example_dags_no_connections.py"""
# we install using the following workaround to overcome installation conflicts, such as:
# apache-airflow 2.3.0 and dbt-core [0.13.0 - 1.5.2] and jinja2>=3.0.0 because these package versions have conflicting dependencies
test-integration-setup = """pip uninstall -y dbt-postgres dbt-databricks dbt-vertica; \
rm -rf airflow.*; \
airflow db init; \
pip install 'dbt-core' 'dbt-databricks' 'dbt-postgres' 'dbt-vertica' 'openlineage-airflow'"""
test-integration = """rm -rf dbt/jaffle_shop/dbt_packages;
pytest -vv \
--cov=cosmos \
--cov-report=term-missing \
--cov-report=xml \
--durations=0 \
-m integration \
-k 'not (sqlite or example_cosmos_sources or example_cosmos_python_models or example_virtualenv)'"""
test-integration-expensive = """pytest -vv \
--cov=cosmos \
--cov-report=term-missing \
--cov-report=xml \
--durations=0 \
-m integration \
-k 'example_cosmos_python_models or example_virtualenv'"""
test-integration-sqlite-setup = """pip uninstall -y dbt-core dbt-sqlite openlineage-airflow openlineage-integration-common; \
rm -rf airflow.*; \
airflow db init; \
pip install 'dbt-core==1.4' 'dbt-sqlite<=1.4' 'dbt-databricks<=1.4' 'dbt-postgres<=1.4' """
test-integration-sqlite = """
pytest -vv \
--cov=cosmos \
--cov-report=term-missing \
--cov-report=xml \
--durations=0 \
-m integration \
-k 'example_cosmos_sources or sqlite'"""
test = 'sh scripts/test/unit.sh'
test-cov = 'sh scripts/test/unit-cov.sh'
test-integration-setup = 'sh scripts/test/integration-setup.sh'
test-integration = 'sh scripts/test/integration.sh'
test-integration-expensive = 'sh scripts/test/integration-expensive.sh'
test-integration-sqlite-setup = 'sh scripts/test/integration-sqlite-setup.sh'
test-integration-sqlite = 'sh scripts/test/integration-sqlite.sh'
test-performance-setup = 'sh scripts/test/performance-setup.sh'
test-performance = 'sh scripts/test/performance.sh'

[tool.pytest.ini_options]
filterwarnings = [
"ignore::DeprecationWarning",
]
filterwarnings = ["ignore::DeprecationWarning"]
minversion = "6.0"
markers = [
"integration",
"sqlite"
]
markers = ["integration", "sqlite", "perf"]

######################################
# DOCS
Expand All @@ -233,7 +167,7 @@ dependencies = [
"sphinx-autobuild",
"sphinx-autoapi",
"openlineage-airflow",
"apache-airflow-providers-cncf-kubernetes>=5.1.1"
"apache-airflow-providers-cncf-kubernetes>=5.1.1",
]

[tool.hatch.envs.docs.scripts]
Expand Down
8 changes: 8 additions & 0 deletions scripts/test/integration-expensive.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
pytest -vv \
--cov=cosmos \
--cov-report=term-missing \
--cov-report=xml \
--durations=0 \
-m integration \
--ignore=tests/perf \
-k 'example_cosmos_python_models or example_virtualenv'
6 changes: 6 additions & 0 deletions scripts/test/integration-setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# we install using the following workaround to overcome installation conflicts, such as:
# apache-airflow 2.3.0 and dbt-core [0.13.0 - 1.5.2] and jinja2>=3.0.0 because these package versions have conflicting dependencies
pip uninstall -y dbt-postgres dbt-databricks dbt-vertica; \
rm -rf airflow.*; \
airflow db init; \
pip install 'dbt-core' 'dbt-databricks' 'dbt-postgres' 'dbt-vertica' 'openlineage-airflow'
4 changes: 4 additions & 0 deletions scripts/test/integration-sqlite-setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
pip uninstall -y dbt-core dbt-sqlite openlineage-airflow openlineage-integration-common; \
rm -rf airflow.*; \
airflow db init; \
pip install 'dbt-core==1.4' 'dbt-sqlite<=1.4' 'dbt-databricks<=1.4' 'dbt-postgres<=1.4'
8 changes: 8 additions & 0 deletions scripts/test/integration-sqlite.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
pytest -vv \
--cov=cosmos \
--cov-report=term-missing \
--cov-report=xml \
--durations=0 \
-m integration \
--ignore=tests/perf \
-k 'example_cosmos_sources or sqlite'
9 changes: 9 additions & 0 deletions scripts/test/integration.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
rm -rf dbt/jaffle_shop/dbt_packages;
pytest -vv \
--cov=cosmos \
--cov-report=term-missing \
--cov-report=xml \
--durations=0 \
-m integration \
--ignore=tests/perf \
-k 'not (sqlite or example_cosmos_sources or example_cosmos_python_models or example_virtualenv)'
4 changes: 4 additions & 0 deletions scripts/test/performance-setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
pip uninstall -y dbt-core dbt-sqlite openlineage-airflow openlineage-integration-common; \
rm -rf airflow.*; \
airflow db init; \
pip install 'dbt-core==1.4' 'dbt-sqlite<=1.4' 'dbt-databricks<=1.4' 'dbt-postgres<=1.4'
5 changes: 5 additions & 0 deletions scripts/test/performance.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pytest -vv \
-s \
-m 'perf' \
--ignore=tests/test_example_dags.py \
--ignore=tests/test_example_dags_no_connections.py
10 changes: 10 additions & 0 deletions scripts/test/unit-cov.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
pytest \
-vv \
--cov=cosmos \
--cov-report=term-missing \
--cov-report=xml \
--durations=0 \
-m "not (integration or perf)" \
--ignore=tests/perf \
--ignore=tests/test_example_dags.py \
--ignore=tests/test_example_dags_no_connections.py
7 changes: 7 additions & 0 deletions scripts/test/unit.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
pytest \
-vv \
--durations=0 \
-m "not (integration or perf)" \
--ignore=tests/perf \
--ignore=tests/test_example_dags.py \
--ignore=tests/test_example_dags_no_connections.py
Loading

0 comments on commit 9fe2f1d

Please sign in to comment.