Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset Next Trigger Modal Not Populating Latest Update #26892

Closed
1 of 2 tasks
tseruga opened this issue Oct 5, 2022 · 3 comments · Fixed by #29441
Closed
1 of 2 tasks

Dataset Next Trigger Modal Not Populating Latest Update #26892

tseruga opened this issue Oct 5, 2022 · 3 comments · Fixed by #29441
Assignees
Labels
area:webserver Webserver related Issues kind:bug This is a clearly a bug

Comments

@tseruga
Copy link

tseruga commented Oct 5, 2022

Apache Airflow version

2.4.1

What happened

When using dataset scheduling, it isn't obvious which datasets a downstream dataset consumer is awaiting in order for the DAG to be scheduled.

I would assume that this is supposed to be solved by the Latest Update column in the modal that opens when selecting x of y datasets updated, but it appears that the data isn't being populated.

image

Although one of the datasets has been produced, there is no data in the Latest Update column of the modal.

In the above example, both datasets have been produced > 1 time.

image

image

What you think should happen instead

The Latest Update column should be populated with the latest update timestamp for each dataset required to schedule a downstream, dataset consuming DAG.

Ideally there would be some form of highlighting on the "missing" datasets for quick visual feedback when DAGs have a large number of datasets required for scheduling.

How to reproduce

  1. Create a DAG (or 2 individual DAGs) that produces 2 datasets
  2. Produce both datasets
  3. Then produce only one dataset
  4. Check the modal by clicking from the home screen on the x of y datasets updated button.

Operating System

Debian GNU/Linux 11 (bullseye)

Versions of Apache Airflow Providers

No response

Deployment

Docker-Compose

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@tseruga tseruga added area:core kind:bug This is a clearly a bug labels Oct 5, 2022
@uranusjr uranusjr added area:UI Related to UI/UX. For Frontend Developers. and removed area:core labels Oct 6, 2022
@tseruga
Copy link
Author

tseruga commented Oct 6, 2022

Just to expand upon the state that is causing this issue a bit more...

The DatasetEvent model looks correct in the database:
image

Unsure why this query isn't populating the lastUpdate field in API response:

airflow/airflow/www/views.py

Lines 3447 to 3451 in 8898db9

for info in session.query(
DatasetModel.id,
DatasetModel.uri,
func.max(DatasetEvent.timestamp).label("lastUpdate"),
)

image

@bbovenzi
Copy link
Contributor

@tseruga could you share the created_at time for the most recent dag run? I wonder if somehow dag run was created after the next update to test_dataset

airflow/airflow/www/views.py

Lines 3461 to 3468 in 8898db9

.join(
DatasetEvent,
and_(
DatasetEvent.dataset_id == DatasetModel.id,
DatasetEvent.timestamp > DatasetDagRunQueue.created_at,
),
isouter=True,
)

@bbovenzi bbovenzi added area:webserver Webserver related Issues and removed area:UI Related to UI/UX. For Frontend Developers. labels Oct 20, 2022
@michaelmicheal
Copy link
Contributor

Feel free to assign this to me. I agree this should be more robust and clear in the UI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:webserver Webserver related Issues kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants