Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP-44 make database isolation mode work in Breeze #40894

Merged
merged 1 commit into from
Jul 20, 2024

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Jul 19, 2024

With this PR, it is possible to get a working "DB isolation" working solution with celery executor. From my tests it works comparably fast to the non-DB isolation executor.

Things changed here:

  • remove "schedule_downstream_tasks" endpoint. It is currently not possible to get it as DAG object is removed during serialization and this is where this method calculates which tasks to schedule

  • when we are forcing DB access in DB isolation mode, we print log message that we are switching to using DB for appropriate components. We also make sure to remove DB configuration just in case it is set (this allows to run tests in breeze environment with more certainty)

  • the detection whether to force direct DB access is made in _main - this way regular commands run in breeze (migrate/user etc. can use the DB while intializing the environment and actions can be logged to DB or via RPC calls.

  • improved diagnostics


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member Author

potiuk commented Jul 19, 2024

I run quite many DAGs in "DB isolation mode" and with those changes, it looks like celery worker nicely runs them (and it's not even very slow it looks). For now "mini scheduler" is disabled - could be brought back likely if we decide to serialize DAG (currently we don't).

@potiuk
Copy link
Member Author

potiuk commented Jul 19, 2024

breeze start-airflow --database-isolation --executor CeleryExecutor --load-default-connections --load-example-dags

airflow/__main__.py Outdated Show resolved Hide resolved
airflow/__main__.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

@potiuk potiuk force-pushed the breeze-working-in-isolation-mode branch from 017d434 to 2810bc0 Compare July 20, 2024 07:18
With this PR, it is possible to get a working "DB isolation" working
solution with celery executor. From my tests it works comparably
fast to the non-DB isolation executor.

Things changed here:

* remove "schedule_downstream_tasks" endpoint. It is currently
  not possible to get it as DAG object is removed during
  serialization and this is where this method calculates which
  tasks to schedule

* when we are forcing DB access in DB isolation mode, we print
  log message that we are switching to using DB for appropriate
  components. We also make sure to remove DB configuration just
  in case it is set (this allows to run tests in breeze environment
  with more certainty)

* the detection whether to force direct DB access is made in
  _main - this way regular commands run in breeze (migrate/user
  etc. can use the DB while intializing the environment and actions
  can be logged to DB or via RPC calls.

* improved diagnostics

Co-authored-by: Vincent <[email protected]>
@potiuk potiuk force-pushed the breeze-working-in-isolation-mode branch from 2810bc0 to 07e5b10 Compare July 20, 2024 17:18
@potiuk potiuk merged commit 6684481 into apache:main Jul 20, 2024
80 checks passed
@potiuk potiuk deleted the breeze-working-in-isolation-mode branch July 20, 2024 18:43
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Jul 22, 2024
@ephraimbuddy ephraimbuddy added this to the Airflow 2.10.0 milestone Jul 23, 2024
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Jul 26, 2024
With this PR, it is possible to get a working "DB isolation" working
solution with celery executor. From my tests it works comparably
fast to the non-DB isolation executor.

Things changed here:

* remove "schedule_downstream_tasks" endpoint. It is currently
  not possible to get it as DAG object is removed during
  serialization and this is where this method calculates which
  tasks to schedule

* when we are forcing DB access in DB isolation mode, we print
  log message that we are switching to using DB for appropriate
  components. We also make sure to remove DB configuration just
  in case it is set (this allows to run tests in breeze environment
  with more certainty)

* the detection whether to force direct DB access is made in
  _main - this way regular commands run in breeze (migrate/user
  etc. can use the DB while intializing the environment and actions
  can be logged to DB or via RPC calls.

* improved diagnostics

Co-authored-by: Vincent <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:CLI area:dev-tools area:providers area:serialization area:webserver Webserver related Issues changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) provider:fab
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants