Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webserver does not fetch server logs when there are remote logs #41164

Closed
1 of 2 tasks
wolfier opened this issue Jul 31, 2024 · 4 comments
Closed
1 of 2 tasks

Webserver does not fetch server logs when there are remote logs #41164

wolfier opened this issue Jul 31, 2024 · 4 comments
Labels
area:core area:logging area:webserver Webserver related Issues kind:bug This is a clearly a bug

Comments

@wolfier
Copy link
Contributor

wolfier commented Jul 31, 2024

Apache Airflow version

Other Airflow 2 version (please specify below)

If "Other Airflow 2 version" selected, which one?

2.9.3

What happened?

The webserver is not reaching out to the triggerer log server for the corresponding trigger logs of a deferred task instance. The webserver only reads from the worker / triggerer log server when there are no local logs or remote logs. This behaviour was introduced in #39177.

When the task instance is in a non-terminal state and have remote logs, the behaviour no longer aligns with expectation of live logs as described in the documentation for Serving logs from workers and triggerer.

A specific use case is deferrable operators which has essentially two executions.

  1. First execution is to submit the trigger and put the task into a deferred state
  2. Second execution is to process the trigger event

After the first execution, task log is pushed to the remote location. From then on, the task log view see the log in the remote location and fetches it as expected but it also means the webserver will not reach out to the triggerer for logs.

What you think should happen instead?

If the task instance is deferred and have remote logs, the webserver should still reach out to the triggerer log server.

The behaviour introduced by #39177 so that task instances in a terminal state can continue to fetch logs from the log server if there are no remote logs or local logs. The user stated that their logs are stored in a persistent storage on their worker which is why the user wants to allow server log fetching when there are no remote logging.

I think the log reading code needs to specify a logical path where there are no remote log or local log for deployments without remote logging and logs are not stored on the webserver.

How to reproduce

Setup a deployment with remote logging and run a deferrable task.

Operating System

n/a

Versions of Apache Airflow Providers

n/a

Deployment

Astronomer

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@wolfier wolfier added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Jul 31, 2024
@wolfier
Copy link
Contributor Author

wolfier commented Jul 31, 2024

@kahlstrm / @RNHTTR

Can you speak more about the change introduced in #39177 in case my interpretation is insufficient / incorrect.

@dosubot dosubot bot added the area:webserver Webserver related Issues label Jul 31, 2024
@kahlstrm
Copy link
Contributor

kahlstrm commented Aug 1, 2024

@kahlstrm / @RNHTTR

Can you speak more about the change introduced in #39177 in case my interpretation is insufficient / incorrect.

The webserver only reads from the worker / triggerer log server when there are no local logs or remote logs. This behaviour was introduced in #39177.

To clarify on this point, #39177 introduced this particular behaviour as an alternative implementation of #32561, which entirely removed fetching logs from the worker / triggerer log server for past task runs. The way #39177 was implemented is to retain the wanted behaviour of #32561 of not triggering the HTTP request in cases where remote logs were found, but still to support our use case of storing the logs on the worker with a persistent volume.

The deferred state logic was kept to be as close as possible to the previous implementations, however it became evident in #39496 (comment), adding tests for the deferred state caused test flakiness with unexpected results. It might be that this has caused a regression in the deferred state, as that one was untested in our use case, whereas viewing previous task attempts was confirmed to be working as expected again after #39177.

If this is deemed problematic/ a regression, going back to behavior prior to #32561 would be fine at least for our use case, as that initially introduced this behavior of not serving the worker / triggerer logs in certain circumstances.

@shahar1 shahar1 added area:logging and removed needs-triage label for new issues that we didn't triage yet labels Aug 2, 2024
@kahlstrm
Copy link
Contributor

kahlstrm commented Aug 9, 2024

This is probably resolved by #41272 ?

@wolfier
Copy link
Contributor Author

wolfier commented Sep 5, 2024

Closed by #41272 as the original behaviour is restored.

@wolfier wolfier closed this as completed Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core area:logging area:webserver Webserver related Issues kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

3 participants