Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DockerOperator: use DOCKER_HOST as default for docker_url #38387

Merged
merged 1 commit into from
Mar 22, 2024

Conversation

maresb
Copy link
Contributor

@maresb maresb commented Mar 21, 2024

Respect the DOCKER_HOST environment variable with the DockerOperator.


I was really surprised that the DOCKER_HOST environment variable is ignored. I was also surprised that I didn't find any proposal to add it, so apologies in case I missed a previous discussion. Might this be an acceptable change?

Of course this will break in situations where DOCKER_HOST is set, but the user is relying on the default value of docker_url to override this setting. Having such a setup is really begging for trouble, so I wouldn't feel so guilty about breaking those cases.

TODO if I get 👍 to proceed:

  • Add tests (any suggestions?)
  • In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.
  • Anything else?

Copy link

boring-cyborg bot commented Mar 21, 2024

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://s.apache.org/airflow-slack

@Taragolis
Copy link
Contributor

Add tests (any suggestions?)

Simple tests which check respect docker host, case if env set to empty string (edge case) and if DOCKER_HOST not set

Tests usual stored into the test/{path-from-airflow-package}, e.g.tests/providers/docker/operators/test_docker.py

Test environ variables might be done by monkeypatch pytest fixture, something like that (just an example for the reference,

class TestDockerOperator:
    ...

    def test_respect_docker_host_env(self, monkeypatch):
        monkeypatch.setenv("DOCKER_HOST", "foo.bar")
        assert DockerOperator(task_id="test-task", image="test").docker_url == "foo.bar"

    def test_docker_host_env_empty(self, monkeypatch):
        monkeypatch.setenv("DOCKER_HOST", "")
        assert DockerOperator(task_id="test-task", image="test").docker_url == "expected-host-here"

    def test_docker_host_env_not_set(self, monkeypatch):
        monkeypatch.delenv("DOCKER_HOST", raising=False)
        assert DockerOperator(task_id="test-task", image="test").docker_url == "expected-host-here"

In case of backwards incompatible changes please leave a note in a newsfragment file, named

Is it backward incompatible change?

@maresb
Copy link
Contributor Author

maresb commented Mar 22, 2024

Thanks so much @Taragolis for the advice! I will look into it.

Is it backward incompatible change?

It depends on your definition of "backward incompatible". The only breakage I can imagine is quite contrived:

  • The user is running two Docker hosts, one on the default unix://var/run/docker.sock and the other on tcp://docker-proxy:2375. The user intends to use unix://var/run/docker.sock with Airflow and tcp://docker-proxy:2375 with everything else, so they set DOCKER_HOST=tcp://docker-proxy:2375 and rely otherwise on the default value of docker_url for Airflow.
  • The fix in this case is that, after this PR has been merged, the user must explicitly set docker_url="unix://var/run/docker.sock" in order to override the value in DOCKER_HOST.

Do you think this warrants a newsfragment?

@Taragolis
Copy link
Contributor

Do you think this warrants a newsfragment?

We do not use newfragments for the providers changes, so this not this case.

However if you thought it might break someone pipeline than better create a new major version of the provider.
In this case you have to add new record to the provider.yaml versions, 4.0.0 in this case

versions:
- 3.9.2

And add into the CHANGELOG.rst info about Breaking changes and how it might be resolved, and what user should change

Some example for reference:

8.0.0
.....
Breaking changes
~~~~~~~~~~~~~~~~
.. warning::
``SlackHook`` and ``SlackWebhookHook`` constructor expected keyword-only arguments.
Removed deprecated parameter ``token`` from the ``SlackHook`` and dependent operators.
Required create ``Slack API Connection`` and provide connection id to ``slack_conn_id`` operators / hook,
and the behavior should stay the same.
Parsing Slack Incoming Webhook Token from the Connection ``hostname`` is removed, ``password`` should be filled.
Removed deprecated parameter ``webhook_token`` from the ``SlackWebhookHook`` and dependent operators
Required create ``Slack Incoming Webhook Connection`` and provide connection id to ``slack_webhook_conn_id``
operators / hook, and the behavior should stay the same.
Removed deprecated method ``execute`` from the ``SlackWebhookHook``. Use ``send``, ``send_text`` or ``send_dict`` instead.
Removed deprecated parameters ``attachments``, ``blocks``, ``channel``, ``username``, ``username``,
``icon_emoji`` from the ``SlackWebhookHook``. Provide them directly to ``SlackWebhookHook.send`` method,
and the behavior should stay the same.
Removed deprecated parameter ``message`` from the ``SlackWebhookHook``.
Provide ``text`` directly to ``SlackWebhookHook.send`` method, and the behavior should stay the same.
Removed deprecated parameter ``link_names`` from the ``SlackWebhookHook`` and dependent operators.
This parameter has no affect in the past, you should not provide it.
If you want to mention user see: `Slack Documentation <https://api.slack.com/reference/surfaces/formatting#mentioning-users>`__.
Removed deprecated parameters ``endpoint``, ``method``, ``data``, ``headers``, ``response_check``,
``response_filter``, ``extra_options``, ``log_response``, ``auth_type``, ``tcp_keep_alive``,
``tcp_keep_alive_idle``, ``tcp_keep_alive_idle``, ``tcp_keep_alive_count``, ``tcp_keep_alive_interval``
from the ``SlackWebhookOperator``. Those parameters has no affect in the past, you should not provide it.


cc: @eladkal @potiuk do you think this changes might considered as breaking changes?

@potiuk
Copy link
Member

potiuk commented Mar 22, 2024

cc: @eladkal @potiuk do you think this changes might considered as breaking changes?

I'd treat is a bug-fix but add a separate note in the changelog. DOCKER_HOST overrididng default value is well-respected standard and we should follow it. It's very unlikely that somoene sets DOCKER_HOST and does not want it to be used by default.

@eladkal
Copy link
Contributor

eladkal commented Mar 22, 2024

cc: @eladkal @potiuk do you think this changes might considered as breaking changes?

I'd treat is a bug-fix but add a separate note in the changelog. DOCKER_HOST overrididng default value is well-respected standard and we should follow it. It's very unlikely that somoene sets DOCKER_HOST and does not want it to be used by default.

I agree

@maresb maresb force-pushed the docker-operator-docker-host-envvar branch 2 times, most recently from a154e05 to 3bb3cac Compare March 22, 2024 12:20
@maresb
Copy link
Contributor Author

maresb commented Mar 22, 2024

Thanks so much for the support!

I pushed some draft tests. I haven't set up testing locally, and I'm hoping that things are simple enough that I can use the CI test suite. Seems like I require approval, so I'll look into setting it up locally.

I'd treat is a bug-fix but add a separate note in the changelog.

It looks like the changelog is managed semi-automatically, so I'm not sure if I should edit the changelog by hand. If you want a one-liner, I could suggest:

Use the DOCKER_HOST environment variable as the default value for the docker_url parameter.

airflow/decorators/__init__.pyi Outdated Show resolved Hide resolved
airflow/providers/docker/operators/docker.py Outdated Show resolved Hide resolved
@potiuk
Copy link
Member

potiuk commented Mar 22, 2024

It looks like the changelog is managed semi-automatically

Add a one liner at the top please (see comment at the beginning of the changelog) -we will almost for sure forget to add the line if it is not added now and release manager will have to remember about looking back to all the PRs to see if one is missing.

@maresb
Copy link
Contributor Author

maresb commented Mar 22, 2024

I think I've addressed all the feedback, and I'm optimistic that the tests are succeeding. I'll keep an eye on them and wait for everything to go green unless I hear more feedback. Thanks so much for the awesome review process!!!

@maresb maresb force-pushed the docker-operator-docker-host-envvar branch from 1f21acb to ddc2634 Compare March 22, 2024 14:26
@maresb maresb force-pushed the docker-operator-docker-host-envvar branch from ddc2634 to 3d7e433 Compare March 22, 2024 14:28
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@Taragolis Taragolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@potiuk potiuk merged commit 947c48b into apache:main Mar 22, 2024
46 checks passed
Copy link

boring-cyborg bot commented Mar 22, 2024

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@maresb maresb deleted the docker-operator-docker-host-envvar branch March 22, 2024 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants