Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-3439] Decode logs with 'utf-8' #4474

Merged
merged 1 commit into from
Jan 12, 2019
Merged

[AIRFLOW-3439] Decode logs with 'utf-8' #4474

merged 1 commit into from
Jan 12, 2019

Conversation

RasPavel
Copy link
Contributor

Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.
    • All the public functions and the classes in the PR contain docstrings that explain what it does

Code Quality

  • Passes flake8

Copy link
Member

@XD-DENG XD-DENG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code change is not changing anything actually. Please refer to my line comment.

Please correct me if I'm wrong.

@@ -129,7 +129,7 @@ def gcs_read(self, remote_log_location):
:type remote_log_location: str (path)
"""
bkt, blob = self.parse_gcs_url(remote_log_location)
return self.hook.download(bkt, blob).decode()
return self.hook.download(bkt, blob).decode('utf-8')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default encoding of .decode() is already "utf-8". Please refer to https://docs.python.org/3/library/stdtypes.html#bytes.decode

So .decode('utf-8') is no difference from .decode().

Copy link
Member

@mik-laj mik-laj Jan 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So .decode('utf-8') is no difference from .decode().

I have a different opinion. Airflow support Python 2.7, 3.6=> (Source)

In documentation for Python 2.7, you can read a fragment:

Python’s default encoding is the ‘ascii’ encoding.
(Source)

It is also worth quoting another fragment

str.decode([encoding[, errors]])
Decodes the string using the codec registered for encoding. encoding defaults to the default string encoding.
(Source)

Taking into account the quotations above, the change proposed here changes the behavior of the program.

I hope that the explanations are sufficient and clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, I get encoding error with python 2.7, forgot to mention that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks @mik-laj and @RasPavel for the clarification

@feng-tao feng-tao merged commit 011c85a into apache:master Jan 12, 2019
@feng-tao
Copy link
Member

lgtm

wmorris75 pushed a commit to modmed/incubator-airflow that referenced this pull request Jul 29, 2019
kaxil pushed a commit that referenced this pull request Mar 17, 2020
kaxil pushed a commit that referenced this pull request Mar 19, 2020
kaxil pushed a commit to astronomer/airflow that referenced this pull request Mar 19, 2020
@acroos
Copy link
Contributor

acroos commented Apr 29, 2020

This actually introduces a bug when using the json-file log driver. The pull method on APIClient returns a generator, which in this case returns dicts. The result is the following error:

[2020-04-29 11:14:59,010] {taskinstance.py:1145} ERROR - 'dict' object has no attribute 'decode'
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/docker_operator.py", line 269, in execute
    output = json.loads(l.decode('utf-8').strip())
AttributeError: 'dict' object has no attribute 'decode'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants