Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logs disappearing when a task completes as failed and is up to retry #31764

Closed
1 of 2 tasks
iJanki-gr opened this issue Jun 7, 2023 · 13 comments
Closed
1 of 2 tasks

Logs disappearing when a task completes as failed and is up to retry #31764

iJanki-gr opened this issue Jun 7, 2023 · 13 comments
Labels
area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet pending-response stale Stale PRs per the .github/workflows/stale.yml policy file

Comments

@iJanki-gr
Copy link
Contributor

iJanki-gr commented Jun 7, 2023

Apache Airflow version

2.6.1

What happened

(may be related to issue #31054 )

I'm experiencing an issue with logs. The set up is airflow running in k8s with official helm chart. No remote logging, logs are served by the workers.

All seems working (after bug mentioned in #31054 was fixed) except in the case of failed tasks. While running the task logs will show. As soon as the task finishes as FAILED and it's up to retry the logs disappear.

What you think should happen instead

Logs should still be visible after task completes.

How to reproduce

Run airflow with no remote logging in k8s with helm chart.
Run a task that will fail with retries

Operating System

Debian GNU/Linux 11 (bullseye) (docker image)

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==8.0.0
apache-airflow-providers-celery==3.1.0
apache-airflow-providers-cncf-kubernetes==6.1.0
apache-airflow-providers-common-sql==1.4.0
apache-airflow-providers-docker==3.6.0
apache-airflow-providers-elasticsearch==4.4.0
apache-airflow-providers-ftp==3.3.1
apache-airflow-providers-google==10.0.0
apache-airflow-providers-grpc==3.1.0
apache-airflow-providers-hashicorp==3.3.1
apache-airflow-providers-http==4.3.0
apache-airflow-providers-imap==3.1.1
apache-airflow-providers-microsoft-azure==6.0.0
apache-airflow-providers-mysql==5.0.0
apache-airflow-providers-odbc==3.2.1
apache-airflow-providers-postgres==5.4.0
apache-airflow-providers-redis==3.1.0
apache-airflow-providers-sendgrid==3.1.0
apache-airflow-providers-sftp==4.2.4
apache-airflow-providers-slack==7.2.0
apache-airflow-providers-snowflake==4.0.5
apache-airflow-providers-sqlite==3.3.2
apache-airflow-providers-ssh==3.6.0

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@iJanki-gr iJanki-gr added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Jun 7, 2023
@iJanki-gr iJanki-gr changed the title Logs disappearing when a task completes as failed Logs disappearing when a task completes as failed and is up to retry Jun 7, 2023
@itsallsame
Copy link

same problem ~

@hussein-awala
Copy link
Member

Do you have log persistence enabled in the chart?

@iJanki-gr
Copy link
Contributor Author

The airflow.logs.persistence is false in the chart. But I see PVC for worker logs and the mounts in the worker pod

  volumes:
  - name: logs
    persistentVolumeClaim:
      claimName: logs-airflow-bd-dev-worker-0
...
    volumeMounts:
    - mountPath: /opt/airflow/logs
      name: logs

Not sure what is the difference with airflow.logs.persistence=true but I cannot enable that as it seems to require a ReadWriteMany volume which I don't have in my environment.

@kxepal
Copy link
Member

kxepal commented Jul 1, 2023

We faced this issue after upgrade from 2.5.1 to 2.6.2 and eventually solved it using this patch:

--- airflow/utils/log/file_task_handler.py.orig	2023-06-30 12:42:39 UTC
+++ airflow/utils/log/file_task_handler.py
@@ -324,7 +324,7 @@ class FileTaskHandler(logging.Handler):
         if ti.state in (TaskInstanceState.RUNNING, TaskInstanceState.DEFERRED) and not executor_messages:
             served_messages, served_logs = self._read_from_logs_server(ti, worker_log_rel_path)
             messages_list.extend(served_messages)
-        elif ti.state not in State.unfinished and not (local_logs or remote_logs):
+        elif not (local_logs or remote_logs):
             # ordinarily we don't check served logs, with the assumption that users set up
             # remote logging or shared drive for logs for persistence, but that's not always true
             # so even if task is done, if no local logs or remote logs are found, we'll check the worker

Not sure if it's corrent one, but at least it works.

@potiuk
Copy link
Member

potiuk commented Jul 4, 2023

cc: @dstandish - I think you need to chime-in here, I am not sure what the right if should be here :)

@github-actions
Copy link

github-actions bot commented Aug 4, 2023

This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Aug 4, 2023
@github-actions
Copy link

This issue has been closed because it has not received response from the issue author.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 12, 2023
@vgutkovsk
Copy link

Was this issue fixed? Same issue appears on v2.7.3

@kxepal
Copy link
Member

kxepal commented Feb 8, 2024

Not sure but we still using the patch I provided here for 2.8.0. No harm at least. Could recheck with the next upgrade.

@carlospalol
Copy link

Experiencing this with

Airflow 2.9.1
Python 3.11
core__executor: CeleryExecutor
logging__remote_logging: False
logging__base_log_folder: /opt/airflow/logs
logging__delete_local_logs: False

When Task state changes to up_for_retry, log changes to something like this:

airflow-worker-1.airflow-worker.redacted-airflow.svc.cluster.local

When Task finally fails, log shows normally.

@carlospalol
Copy link

Hi @potiuk, should this ticket be reopened? Do you want me to create a new ticket?

@kxepal
Copy link
Member

kxepal commented Jun 3, 2024

Not sure but we still using the patch I provided here for 2.8.0. No harm at least. Could recheck with the next upgrade.

I forgot to recheck the patch need for 2.8.0+ releases - we still have it and there is no issues with logs. So I guess it's still actual.

@carlospalol
Copy link

Seems related to #39496.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet pending-response stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

No branches or pull requests

7 participants