Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS ECS Logging very slow when lots of logging leading to task failure #42442

Open
2 tasks done
smsm1-ito opened this issue Sep 24, 2024 · 2 comments
Open
2 tasks done
Labels
area:logging area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet provider:amazon-aws AWS/Amazon - related issues

Comments

@smsm1-ito
Copy link
Contributor

Apache Airflow Provider(s)

amazon

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==8.28.0 is affected

apache-airflow-providers-amazon==8.27.0 is not affected

Apache Airflow version

2.10.2

Operating System

Debian 12 bookworm

Deployment

Other Docker-based deployment

Deployment details

No response

What happened

After upgrading to Airflow 2.10.2 longer running ECS tasks with significant logging started failing. The logs would still be slowly appearing on Airflow, yet the ECS Task had completed. If the logging took more than an hour more than the task, then the ECS task in Airflow would fail with an error that the ECS Task was missing. This is due to the older tasks disappearing within ECS (Fargate).

Looking at the changes I came across https://github.com/apache/airflow/pull/41515/files which added a 0.1 second sleep if the timestamps were the same. On looking further at the logs of the tasks that were failing, there were 2 log times. One which was getting significantly later than the other from the application.

On rolling back the amazon provider to the previous version and still using Airflow 2.10.2 the issue went away.

Linked tickets #41515 #40875

What you think should happen instead

Logging should be submitted in a timely manner.

Could we go for a much shorter delay such as 0.001 seconds?

How to reproduce

Have an ECS Task that has a lot more logging than the time it takes to run the task.

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@smsm1-ito smsm1-ito added area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Sep 24, 2024
Copy link

boring-cyborg bot commented Sep 24, 2024

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@dosubot dosubot bot added area:logging provider:amazon-aws AWS/Amazon - related issues labels Sep 24, 2024
smsm1-ito added a commit to smsm1-ito/airflow that referenced this issue Sep 24, 2024
Longer running tasks with lots of logs were slow to write all of the logs to AWS. This reduces the sleep delay to prevent the issue.
@smsm1-ito
Copy link
Contributor Author

I've created a merge request for this: #42449

vincbeck pushed a commit that referenced this issue Sep 25, 2024
…42449)

Longer running tasks with lots of logs were slow to write all of the logs to AWS. This reduces the sleep delay to prevent the issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logging area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

No branches or pull requests

1 participant