AWS ECS Logging very slow when lots of logging leading to task failure #42442
Labels
area:logging
area:providers
kind:bug
This is a clearly a bug
needs-triage
label for new issues that we didn't triage yet
provider:amazon-aws
AWS/Amazon - related issues
Apache Airflow Provider(s)
amazon
Versions of Apache Airflow Providers
apache-airflow-providers-amazon==8.28.0 is affected
apache-airflow-providers-amazon==8.27.0 is not affected
Apache Airflow version
2.10.2
Operating System
Debian 12 bookworm
Deployment
Other Docker-based deployment
Deployment details
No response
What happened
After upgrading to Airflow 2.10.2 longer running ECS tasks with significant logging started failing. The logs would still be slowly appearing on Airflow, yet the ECS Task had completed. If the logging took more than an hour more than the task, then the ECS task in Airflow would fail with an error that the ECS Task was missing. This is due to the older tasks disappearing within ECS (Fargate).
Looking at the changes I came across https://github.com/apache/airflow/pull/41515/files which added a 0.1 second sleep if the timestamps were the same. On looking further at the logs of the tasks that were failing, there were 2 log times. One which was getting significantly later than the other from the application.
On rolling back the amazon provider to the previous version and still using Airflow 2.10.2 the issue went away.
Linked tickets #41515 #40875
What you think should happen instead
Logging should be submitted in a timely manner.
Could we go for a much shorter delay such as 0.001 seconds?
How to reproduce
Have an ECS Task that has a lot more logging than the time it takes to run the task.
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: