Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airflow 1.10.7+ suppresses Operator logs #8484

Closed
kaxil opened this issue Apr 21, 2020 · 19 comments · Fixed by #9363
Closed

Airflow 1.10.7+ suppresses Operator logs #8484

kaxil opened this issue Apr 21, 2020 · 19 comments · Fixed by #9363
Assignees
Labels
area:logging kind:bug This is a clearly a bug

Comments

@kaxil
Copy link
Member

kaxil commented Apr 21, 2020

First reported at https://issues.apache.org/jira/browse/AIRFLOW-6904 by Rahul Jain.

After upgrading from 1.10.2 to 1.10.9, we noticed that the Operator logs are no longer printed.

1.10.2:

image

1.10.9:

image

This worked fine with Airflow <=1.10.6 and was most likely caused by our change to using os.fork (#6627).

@kaxil kaxil added the kind:bug This is a clearly a bug label Apr 21, 2020
@kaxil kaxil added this to the Airflow 1.10.11 milestone Apr 21, 2020
@jeffolsi
Copy link

@rahulj51
Copy link

Steps to reproduce:

  1. Create a simple DAG with a PythonOperator.
  2. In the Python function, print some logs. For ex.
for i in range(1,10):
        time.sleep(2)
        logging.info("running job..")

logging.info("job completed")
  1. Run this with Airflow 1.10.6. Logs will be printed.

1 10 6

  1. Run again with 1.10.9. Logs will not be printed.

1 10 9

@davido912
Copy link
Contributor

Did you manage to find resolution to this?
@rahulj51 We ended up wiping up our remote Airflow instance and setting up a new one, this solved the problem (for now). I didn't manage to find the reason behind this, but if you check your scheduler, you will see that the logs are being fired into the scheduler, however, not into the log file itself.

@rahulj51
Copy link

Interesting. I actually tried this on a fresh docker container and it still didn't print the logs.

@ewjmulder
Copy link

ewjmulder commented May 14, 2020

We experienced a similar issue after upgrading from 1.10.3 to 1.10.9: no more task logging visible in the UI. In our case it was specifically for the KubernetesPodOperator, but not for the BashOperator for instance. We fixed it by adding a logger for the airflow.contrib package (in the main airflow logging configuration file, see snippet below), since the KubernetesPodOperator lives there. So if the operator you're missing logs from is also from the contrib package, you can try this fix to see if it re-appears. Probably it also works for other operators, as long as you specify the right package they are in as extra logger.

My guess is that the combination of forking instead of starting a whole new process + having no specific logger for the package triggered this particular issue.

Added log config snippet:

        'airflow.contrib': {
            'handlers': bucket_handler,
            'level': LOG_LEVEL,
            'propagate': True,
        }

@davido912
Copy link
Contributor

davido912 commented May 14, 2020

@ewjmulder You added this to airflow_local_settings.py?

And, for airflow.contrib, did you import something on there? because I'm failing to find where and what you did to be honest. Would appreciate if you could be a bit more specific.

@ewjmulder
Copy link

@davido912 In the Airflow project the example config is in the airflow_local_settings.py file indeed. But you can customize this with the logging_config_class = ... setting in airflow.cfg (as we have done). Either way, it boils down to a dictionary variable called something like (DEFAULT_)LOGGING_CONFIG that contains (among others) the formatters, handlers and loggers properties. Inside that loggers property the various loggers are defined and that is where you should add one with as key the Airflow package name for which you are missing the logging (for instance airflow.contrib), see the example snippet in my comment above.

@davido912
Copy link
Contributor

@ewjmulder seems to work, nice job!
we have 3 remote airflow instances and I basically wiped out one and installed without using any custom logging configs, that worked as well.

@ewjmulder
Copy link

Yes, I guess the issue is related to an upgrade from an older version where you also (try to) migrate the log config but somehow they are not fully compatible and this issue appears. If someone does a deep dive into the diffs on the logging config and code between 1.10.6 and 1.10.7 I'm sure some underlying reason will pop up. But I'm happy with the resolution that we have found :)

@jhtimmins
Copy link
Contributor

Could you all share the different operators where you've seen this happen? Looks like it's an issue on KubernetesPodOperator?

I was unable to attempted to replicate with PythonOperator.

@ewjmulder
Copy link

Indeed for us it was a problem for the KubernetesPodOperator but not for the PythonOperator. Looking at the top post it also affects the DataprocOperator, which also lives in the contrib package. So it seems the direction to look in is loggers for certain packages and default behavior if no logger is configured for a package.

@rahulj51
Copy link

rahulj51 commented May 16, 2020 via email

@jblakeman
Copy link

jblakeman commented May 18, 2020

Had a chance to skim through #6627 and I could be mistaken (have not found the time to grok through all of the moving parts yet) but it seems like using os._exit in the forked child process without first flushing stdio buffers could be a factor:

try:
args.func(args, dag=self.dag)
os._exit(0) # pylint: disable=protected-access
except Exception: # pylint: disable=broad-except
os._exit(1) # pylint: disable=protected-access

From the python docs:

Exit the process with status n, without calling cleanup handlers, flushing stdio buffers, etc.

Perhaps what's in order is a flush() of stdout, stderr before calling os._exit?

@ashb
Copy link
Member

ashb commented May 20, 2020

Is anyone able to make a clean, stand-alone reproduction case that shows this please? (Ideally in the form of a docker image I can pull, or a git repo we can clone)

@kamikaze
Copy link

kamikaze commented Jun 4, 2020

same here, the only way to get logs working - downgrading to 1.10.7, it started to happen since 1.10.9 for us, (we skipped 1.10.8)

@kamikaze
Copy link

kamikaze commented Jun 4, 2020

Is anyone able to make a clean, stand-alone reproduction case that shows this please? (Ideally in the form of a docker image I can pull, or a git repo we can clone)

Here is the DAG you can launch and see by yourself.

import logging.config
import os
from datetime import datetime, timedelta
from time import sleep

from airflow import DAG
from airflow.operators.python_operator import PythonOperator

LOGGING_CONFIG = {
    'version': 1,
    'disable_existing_loggers': False,
    'formatters': {
        'standard': {
            'format': '%(asctime)s [%(levelname)s] PID=%(process)d:TID=%(thread)d '
                      '%(name)s.%(funcName)s:%(lineno)d: %(message)s'
        },
    },
    'handlers': {
        'default': {
            'level': os.getenv('LOGGING_LEVEL', 'INFO'),
            'class': 'logging.StreamHandler',
            'stream': 'ext://sys.stdout',
            'formatter': 'standard'
        },
    },
    'loggers': {
        '': {
            'handlers': ['default'],
            'level': os.getenv('LOGGING_LEVEL', 'INFO'),
            'propagate': True
        }
    }
}
logging.config.dictConfig(LOGGING_CONFIG)
logger = logging.getLogger(__name__)

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2018, 6, 10),
    'catchup': False,
    'email': ['[email protected]'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 3,
    'retry_delay': timedelta(seconds=5),
    'max_retry_delay': timedelta(seconds=60),
    'retry_exponential_backoff': False,
}

dag = DAG(dag_id='aaa_test_logging', catchup=False, default_args=default_args,
          schedule_interval=None, max_active_runs=1)


def aaa_test_logging(**kwargs):
    logger.info('aaa_test_logging START before sleep')
    sleep(5*60)
    logger.info('aaa_test_logging END after sleep')


def generate():
    PythonOperator(
        task_id=f'aaa_test_logging',
        python_callable=aaa_test_logging,
        provide_context=True,
        dag=dag,
    )


generate()

@mik-laj
Copy link
Member

mik-laj commented Jun 7, 2020

Here is workaround: #6627 (comment)

@kaxil kaxil linked a pull request Jun 19, 2020 that will close this issue
6 tasks
@kaxil kaxil self-assigned this Jun 24, 2020
@kaxil
Copy link
Member Author

kaxil commented Jun 29, 2020

The fix will be available in 1.10.11

@kaxil kaxil removed this from the Airflow 1.10.11 milestone Jun 29, 2020
@shanit-saha
Copy link

One of the way we could resolve the same problem in our Airflow v 1.10.10 instance is by using a shell script as an extry point to execute the python application. The sys exit is part of the shell script. Earlier we invoked the python application directly from the CMD entry point in docker-image. The logging issue was observed. Please note we also tried both sys.exit() and sys.stdout.flush() in our python modules. Still it did not help. So eventually shell script entry point with sys exit helped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logging kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.