Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing Task Duration view in case of manual DAG runs only (#22015) #29195

Merged
merged 5 commits into from
Feb 25, 2023

Conversation

duke8585
Copy link
Contributor

Currently, if a DAG history is solely comprised of manual runs, they cannot be filtered down by nums in the duration view, because in this case, the min_date defaults to the current utc_epoch. Consequently, an unlimited number of DAG runs is displayed. With the implemented fallback, the behavior for mixed (scheduled + manual) DAG runs is similar to the default behavior, as it is for manual-only runs. It was implemented this way to avoid breaking changes.

closes: #22015

@boring-cyborg
Copy link

boring-cyborg bot commented Jan 27, 2023

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://s.apache.org/airflow-slack

@Taragolis
Copy link
Contributor

@duke8585, thanks for contribution, however we also need tests for this changes, for more details see:

@duke8585
Copy link
Contributor Author

duke8585 commented Jan 28, 2023

@duke8585, thanks for contribution, however we also need tests for this changes, for more details see:

@Taragolis thank you. to my knowledge the function lacked tests to begin with and i was wondering how to test this as it requires a backend db with DAG runs. but i see some helpful examples in the reference you provided :)

i will see what we can do about that.

@Taragolis
Copy link
Contributor

Taragolis commented Jan 28, 2023

to my knowledge the function lacked tests to begin with and i was wondering how to test this as it requires a backend db with DAG runs

Yeah it required db backend, if you run in venv I thought it would be enough to use SQLite, Airflow created it by default if no backend provided (${AIRFLOW_HOME}/airflow.db or ${AIRFLOW_HOME}/unittests.db), if test DB not initialised you could run pytest with flag --with-db-init

If you run tests inside of Breeze then you already have DB backend by default i think it is Postgres.

And at least PyCharm/IDEA correctly detected all tests out of the box, so you could easily run it in it (just make sure you configure venv)

BTW, @boring-cyborg also provide useful links for First time contributor as first message in this PR

@duke8585
Copy link
Contributor Author

duke8585 commented Feb 6, 2023

hello everyone,
i do not understand why the actions fail at the moment. is this s related to my changes? how to continue? thanks :)

@duke8585
Copy link
Contributor Author

hello again,
running the breeze static-checks --all-files --show-diff-on-failure --color always locally works for me. i do not get where the error occurs. help is greatly appreciated :)

@potiuk
Copy link
Member

potiuk commented Feb 10, 2023

Static checks are fine. Those arenot static checks that are failing but some other tests "hang" (that seems to happen quite often with multiple PRs during the last few hours and we are still tracing the cause of it

@duke8585
Copy link
Contributor Author

just did another manual local test and the UI works as i expect it to.

@potiuk
Copy link
Member

potiuk commented Feb 20, 2023

LGTM - anyone else :) ?

Copy link
Contributor

@Taragolis Taragolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion for remove redundant part of code

airflow/models/dag.py Outdated Show resolved Hide resolved
airflow/models/dag.py Outdated Show resolved Hide resolved
airflow/models/dag.py Outdated Show resolved Hide resolved
duke8585 and others added 5 commits February 22, 2023 15:46
Currently, if a DAG history is solely comprised of manual runs, they cannot be filtered down by `nums` in the duration view, because in this case, the min_date defaults to the current utc_epoch. Consequently, an unlimited number of DAG runs is displayed.
With the implemented fallback, the behavior for mixed (scheduled + manual) DAG runs is similar to the default behavior, as it is for manual-only runs. It was implemented this way to avoid breaking changes.
With this new fix, the determination of the min_date is done on a set of
execution_date values - via querying - limited by `num`and taking the
last element of it. the fallback value if none are found is simply the
base_date. this way, the invoked get_task_instances cannot default to
30 days prior to the current time.

Furthermore, the query itself is no longer confined to non-manual runs,
but includes all DAGRun types.

More context:

During writing the tests, it became evident
that the current implementation of the get_task_instances_before
method has two problems:
1) With the implementation of a limit + offset determination of the
min_date, no results can occur:
  * Either due to less DAGRuns in the DB than `num` or
  * Because of the constraining to non-manual runs (see more below).
Consequently, the fallback in the invoked get_task_instances is then a
-30d from datetime.now(), so an entirely different time dimension than
the one selected in the UI (execution_date).
2) With the explicit omission of manual runs in the query, in case of n
manual runs and 1 scheduled one before the base_date (e.g.
S-M-M-...-M-M-BD), the scheduled one + n manual runs are returned. in
the case of mainly triggered/manual DAGRuns, this leads to more results
displayed than expected. in case of solely manual DAGRuns, the method
will return no results and hence default to the above mentioned
fallback, which operates on a different timeline. prior tp this fix,
there was even a fallback to the start of the current epochal period,
i.e. showing all DAGRuns in the DB.

With the previously suggested non-breaking fix, it was not possible to
avoid this scenario, with any scheduled run found, the fallback was
never reached.

The suggested changes may be considered
breaking. the changes are confined to the scope of the
get_task_instances_before method and usage is only in the views.py for
the duration, tries and landing times, so where the changes  meant to
fix the existing behavior.
Co-authored-by: Andrey Anshin <[email protected]>
Co-authored-by: Andrey Anshin <[email protected]>
Co-authored-by: Andrey Anshin <[email protected]>
@potiuk
Copy link
Member

potiuk commented Feb 25, 2023

Closed/reopened to rebuild

@potiuk potiuk merged commit 8b8552f into apache:main Feb 25, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Feb 25, 2023

Awesome work, congrats on your first merged pull request!

@eladkal eladkal added this to the Airflow 2.5.2 milestone Feb 26, 2023
@duke8585
Copy link
Contributor Author

thanks everyone. i learned quite some things here :)

@pierrejeambrun pierrejeambrun added the type:bug-fix Changelog: Bug Fixes label Feb 27, 2023
pierrejeambrun pushed a commit that referenced this pull request Mar 7, 2023
…29195)

* Fixing Task Duration view in case of manual DAG runs only (#22015)

Currently, if a DAG history is solely comprised of manual runs, they cannot be filtered down by `nums` in the duration view, because in this case, the min_date defaults to the current utc_epoch. Consequently, an unlimited number of DAG runs is displayed.
With the implemented fallback, the behavior for mixed (scheduled + manual) DAG runs is similar to the default behavior, as it is for manual-only runs. It was implemented this way to avoid breaking changes.

* Fixing get_task_instances_before method and adding a unit test

With this new fix, the determination of the min_date is done on a set of
execution_date values - via querying - limited by `num`and taking the
last element of it. the fallback value if none are found is simply the
base_date. this way, the invoked get_task_instances cannot default to
30 days prior to the current time.

Furthermore, the query itself is no longer confined to non-manual runs,
but includes all DAGRun types.

More context:

During writing the tests, it became evident
that the current implementation of the get_task_instances_before
method has two problems:
1) With the implementation of a limit + offset determination of the
min_date, no results can occur:
  * Either due to less DAGRuns in the DB than `num` or
  * Because of the constraining to non-manual runs (see more below).
Consequently, the fallback in the invoked get_task_instances is then a
-30d from datetime.now(), so an entirely different time dimension than
the one selected in the UI (execution_date).
2) With the explicit omission of manual runs in the query, in case of n
manual runs and 1 scheduled one before the base_date (e.g.
S-M-M-...-M-M-BD), the scheduled one + n manual runs are returned. in
the case of mainly triggered/manual DAGRuns, this leads to more results
displayed than expected. in case of solely manual DAGRuns, the method
will return no results and hence default to the above mentioned
fallback, which operates on a different timeline. prior tp this fix,
there was even a fallback to the start of the current epochal period,
i.e. showing all DAGRuns in the DB.

With the previously suggested non-breaking fix, it was not possible to
avoid this scenario, with any scheduled run found, the fallback was
never reached.

The suggested changes may be considered
breaking. the changes are confined to the scope of the
get_task_instances_before method and usage is only in the views.py for
the duration, tries and landing times, so where the changes  meant to
fix the existing behavior.

* Update airflow/models/dag.py

Co-authored-by: Andrey Anshin <[email protected]>

* Update airflow/models/dag.py

Co-authored-by: Andrey Anshin <[email protected]>

* Update airflow/models/dag.py

Co-authored-by: Andrey Anshin <[email protected]>

---------

Co-authored-by: Andrey Anshin <[email protected]>
(cherry picked from commit 8b8552f)
pierrejeambrun pushed a commit that referenced this pull request Mar 8, 2023
…29195)

* Fixing Task Duration view in case of manual DAG runs only (#22015)

Currently, if a DAG history is solely comprised of manual runs, they cannot be filtered down by `nums` in the duration view, because in this case, the min_date defaults to the current utc_epoch. Consequently, an unlimited number of DAG runs is displayed.
With the implemented fallback, the behavior for mixed (scheduled + manual) DAG runs is similar to the default behavior, as it is for manual-only runs. It was implemented this way to avoid breaking changes.

* Fixing get_task_instances_before method and adding a unit test

With this new fix, the determination of the min_date is done on a set of
execution_date values - via querying - limited by `num`and taking the
last element of it. the fallback value if none are found is simply the
base_date. this way, the invoked get_task_instances cannot default to
30 days prior to the current time.

Furthermore, the query itself is no longer confined to non-manual runs,
but includes all DAGRun types.

More context:

During writing the tests, it became evident
that the current implementation of the get_task_instances_before
method has two problems:
1) With the implementation of a limit + offset determination of the
min_date, no results can occur:
  * Either due to less DAGRuns in the DB than `num` or
  * Because of the constraining to non-manual runs (see more below).
Consequently, the fallback in the invoked get_task_instances is then a
-30d from datetime.now(), so an entirely different time dimension than
the one selected in the UI (execution_date).
2) With the explicit omission of manual runs in the query, in case of n
manual runs and 1 scheduled one before the base_date (e.g.
S-M-M-...-M-M-BD), the scheduled one + n manual runs are returned. in
the case of mainly triggered/manual DAGRuns, this leads to more results
displayed than expected. in case of solely manual DAGRuns, the method
will return no results and hence default to the above mentioned
fallback, which operates on a different timeline. prior tp this fix,
there was even a fallback to the start of the current epochal period,
i.e. showing all DAGRuns in the DB.

With the previously suggested non-breaking fix, it was not possible to
avoid this scenario, with any scheduled run found, the fallback was
never reached.

The suggested changes may be considered
breaking. the changes are confined to the scope of the
get_task_instances_before method and usage is only in the views.py for
the duration, tries and landing times, so where the changes  meant to
fix the existing behavior.

* Update airflow/models/dag.py

Co-authored-by: Andrey Anshin <[email protected]>

* Update airflow/models/dag.py

Co-authored-by: Andrey Anshin <[email protected]>

* Update airflow/models/dag.py

Co-authored-by: Andrey Anshin <[email protected]>

---------

Co-authored-by: Andrey Anshin <[email protected]>
(cherry picked from commit 8b8552f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow showing more than 25 last DAG runs in the task duration view
5 participants