Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-5282] Add default timeout on kubeclient & catch HTTPError #5880

Merged
merged 5 commits into from
Aug 23, 2019

Conversation

rolanddb
Copy link

Make sure you have checked all steps below.

Jira

Description

This PR:

  • Adds a default _request_timeout in airflow.cfg, so that default installations will not hang indefinitely
  • Catches the errors in the scheduler, and retries scheduling the task, in a similar fashion to how ApiExcptions are caught. (
    except ApiException as e:
    self.log.warning('ApiException when attempting to run task, re-queueing. '
    'Message: %s' % json.loads(e.body)['message'])
    self.task_queue.put(task)
    )
  • Fixes a bug in the parsing of airflow.cfg. The notes state that a json dict is expected. This is parsed with json.loads, which returns a dict. But in the code, .iteritems() is called on this dict, which cannot work. .items() should be used instead.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • Passes flake8

@ashb ashb self-requested a review August 22, 2019 09:09
airflow/config_templates/default_airflow.cfg Outdated Show resolved Hide resolved
airflow/contrib/executors/kubernetes_executor.py Outdated Show resolved Hide resolved
@rolanddb
Copy link
Author

@andy-g14 I added the timeout configuration from your unit test as a default. This works at runtime, but tests are complaining about a KeyError.

f70a6aa#diff-2fd5967071561042847ef71c5f8b7e8dR216

Did you ever see this work? I tried various changes, e.g. kube_client_request_args = "hello world" runs fine, but it seems that adding json there breaks things..

kube_client_request_args =
# Note that if no _request_timeout is specified, the kubernetes client will wait indefinitely for kubernetes
# api responses, which will cause the scheduler to hang. The timeout is specified as [connect timeout, read timeout]
kube_client_request_args = {"_request_timeout" : [60,60] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
kube_client_request_args = {"_request_timeout" : [60,60] }
kube_client_request_args = {{"_request_timeout" : [60,60] }}

The problem is this file isn't a plain config file, it's actually a giant python format string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting this in the ~/airflow/airflow.cfg wouldn't have the same problem, it's just this default config template that is the problem.

@andy-g14
Copy link
Contributor

@rolanddb , you will need to provide the value as json string, i.e enclosed in single quotes. it works for me this way
kube_client_request_args = '{"_request_timeout" : [10,100] }'

@ashb
Copy link
Member

ashb commented Aug 22, 2019

The problem in the other PR seemed to be around putting a default value in our config template (which is actually a giant python format string so {} -> {{}})

@rolanddb
Copy link
Author

The problem in the other PR seemed to be around putting a default value in our config template (which is actually a giant python format string so {} -> {{}})

Ah, I was very confused why it didn't work!

@ashb ashb merged commit 4636a30 into apache:v1-10-stable Aug 23, 2019
@ashb
Copy link
Member

ashb commented Aug 23, 2019

Nice!

Note that this new default config won't change the value for any existing installs, so some people might have to update their configs to get this new change.

ashb pushed a commit to ashb/airflow that referenced this pull request Aug 29, 2019
…pache#5880)

Catch urllib3 httperror in kube executor

(cherry picked from commit 4636a30)
kaxil pushed a commit that referenced this pull request Aug 30, 2019
…5880)

Catch urllib3 httperror in kube executor

(cherry picked from commit 4636a30)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants