Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing Logs to Elasticsearch: asctime field in JSON log output set to null #10024

Closed
rgrizzell opened this issue Jul 27, 2020 · 2 comments · Fixed by #10515
Closed

Writing Logs to Elasticsearch: asctime field in JSON log output set to null #10024

rgrizzell opened this issue Jul 27, 2020 · 2 comments · Fixed by #10515
Labels
kind:bug This is a clearly a bug

Comments

@rgrizzell
Copy link
Contributor

rgrizzell commented Jul 27, 2020

Apache Airflow version: 1.10.11

Kubernetes version (if you are using kubernetes) (use kubectl version): N/A

Environment:

  • Cloud provider or hardware configuration: QEMU
  • OS (e.g. from /etc/os-release): Ubuntu 20.04 (and Fedora 32)
  • Kernel (e.g. uname -a): 5.4.0-42-generic (and 5.6.6-300.fc32.x86_64)
  • Install tools: Virtualenv
  • Others: Python 2.7.18 (and Python 3.8.3)

What happened:

If Airflow is set to log in Plaintext, it will print the log messages with asctime properly set.
[2020-07-22 15:13:13,783] {standard_task_runner.py:54} INFO - Started process 116982 to run task

With the configuration changes to print to JSON, asctime is set to null.
{"asctime": null, "task_id": "runme_0", "execution_date": "2020_07_23T00_00_00_000000", "filename": "standard_task_runner.py", "lineno": 54, "message": "Started process 281332 to run task", "try_number": "1", "levelname": "INFO", "dag_id": "example_bash_operator"}

What you expected to happen:

The asctime field should contain a proper timestamp when the log message is printed in JSON.

How to reproduce it:

Using any standard default Airflow installation with the Elasticsearch subpackage installed, follow the instructions for Writing Logs to Elasticsearch. A running Elasticsearch instance is not required to reproduce this bug.

Diff of changes to `airflow.cfg`
-- airflow.cfg
+++ airflow.cfg.new
@@ -9,7 +9,7 @@
 
 # Airflow can store logs remotely in AWS S3, Google Cloud Storage or Elastic Search.
 # Set this to True if you want to enable remote logging.
-remote_logging = False
+remote_logging = True
 
 # Users must supply an Airflow connection id that provides access to the storage
 # location.
@@ -754,7 +754,7 @@
 
 [elasticsearch]
 # Elasticsearch host
-host =
+host = localhost:9200
 
 # Format of the log_id, which is used to query for a given tasks logs
 log_id_template = {dag_id}-{task_id}-{execution_date}-{try_number}
@@ -768,10 +768,10 @@
 frontend =
 
 # Write the task logs to the stdout of the worker, rather than the default files
-write_stdout = False
+write_stdout = True
 
 # Instead of the default log formatter, write the log lines as JSON
-json_format = False
+json_format = True
 
 # Log fields to also attach to the json output, if enabled
 json_fields = asctime, filename, lineno, levelname, message

Run the example_bash_operator DAG to generate log output. Check log files or stdout for JSON output.

Anything else we need to know:

I was able to reproduce this pretty consistently regardless of the environment I was testing in, however I never bothered to try setting the configuration via Environment Variables nor did I try to run it in any sort of containerized environment (such as Kubernetes).

@rgrizzell rgrizzell added the kind:bug This is a clearly a bug label Jul 27, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented Jul 27, 2020

Thanks for opening your first issue here! Be sure to follow the issue template!

@rgrizzell
Copy link
Contributor Author

rgrizzell commented Aug 21, 2020

A further deep dive into this issue, I believe this is the result of default behavior in the Python logging module.

When utilizing the PercentStyle Formatter (as the JSONFormatter does by default with style='%'), if fmt is None when the class is instantiated, the PercentStyle Formatter uses "%(levelname)s:%(name)s:%(message)s" as its default logging format.

This is important because if %(asctime)s is not included in the logging format, the function responsible for generating the asctime field is never called.

But wait, if the asctime field doesn't exists why does it show up in the log messages as null? Well, in the case because asctime exists in the json_fields object, the JSONFormatter.format() merges the json_fields object with the LogRecord object and asctime is set to None.

Workaround

The following patch is a workaround for this issue that demonstrates the behavior of the standard logging library.

Index: airflow/utils/log/json_formatter.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- airflow/utils/log/json_formatter.py	(revision c35a01037ad4bb03d39a0319d37bc36b08ccf766)
+++ airflow/utils/log/json_formatter.py	(date 1598039818218)
@@ -25,6 +25,7 @@
 
 from airflow.utils.helpers import merge_dicts
 
+JSON_FMT = "%(asctime)s:%(levelname)s:%(name)s:%(message)s"
 
 class JSONFormatter(logging.Formatter):
     """
@@ -32,6 +33,8 @@
     """
     # pylint: disable=too-many-arguments
     def __init__(self, fmt=None, datefmt=None, style='%', json_fields=None, extras=None):
+        if fmt is None:
+            fmt = JSON_FMT
         super().__init__(fmt, datefmt, style)
         if extras is None:
             extras = {}

This patch generates the following log message:

{"asctime": "2020-08-21 18:42:33,625", "filename": "bash.py", "lineno": 162, "levelname": "INFO", "message": "Command exited with return code 0", "dag_id": "example_bash_operator", "task_id": "run_after_loop", "execution_date": "2020_08_20T00_00_00_000000", "try_number": "1"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant