Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-4843] Allow orchestration via Docker Swarm (SwarmOperator) #5489

Merged
merged 5 commits into from
Aug 13, 2019

Conversation

akki
Copy link
Contributor

@akki akki commented Jun 26, 2019

Make sure you have checked all steps below.

Jira

  • My PR addresses the following AIRFLOW-4843 issues and references them in the PR title.
    • https://issues.apache.org/jira/browse/AIRFLOW-4843
    • In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
    • In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

Tests

  • My PR adds the following unit tests:
    SwarmOperatorTestCase

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • Passes flake8

Add support for running Docker containers via Docker Swarm
which allows the task to run on any machine (node) which
is a part of your Swarm cluster

More details: https://issues.apache.org/jira/browse/AIRFLOW-4843

Built with <3 at Agoda!
@akki
Copy link
Contributor Author

akki commented Jul 2, 2019

@mik-laj Thanks, your examples helped me understand the pylint errors better. I've fixed them now and also moved the operator class to contrib.

@mik-laj
Copy link
Member

mik-laj commented Jul 2, 2019

@potiuk Can you look at - this, our Docker expert?

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice and well documented. I like it. Just a few comments to address.

airflow/contrib/operators/swarm_operator.py Outdated Show resolved Hide resolved
tests/operators/test_swarm_operator.py Outdated Show resolved Hide resolved
airflow/example_dags/example_swarm_operator.py Outdated Show resolved Hide resolved
@akki
Copy link
Contributor Author

akki commented Jul 5, 2019

@potiuk Apologies for taking this long (got stuck with something else). I have addressed your above comments. Please have a look.

The build (doc) failure doesn't seem related to my code. Furthermore, I see the same error in another build of a totally un-related PR so I guess it's Travis itself which has some issues.

Please let me know if this looks good to you now. Thanks :)

@akki akki force-pushed the upstream-swarm-operator branch from ad95515 to b1fc817 Compare July 9, 2019 10:55
@codecov-io
Copy link

Codecov Report

Merging #5489 into master will increase coverage by 0.02%.
The diff coverage is 97.91%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5489      +/-   ##
==========================================
+ Coverage      79%   79.02%   +0.02%     
==========================================
  Files         489      492       +3     
  Lines       30726    30762      +36     
==========================================
+ Hits        24275    24311      +36     
  Misses       6451     6451
Impacted Files Coverage Δ
airflow/operators/docker_operator.py 96.66% <100%> (+0.07%) ⬆️
airflow/utils/strings.py 100% <100%> (ø)
...flow/example_dags/example_docker_swarm_operator.py 100% <100%> (ø)
airflow/contrib/operators/docker_swarm_operator.py 96.55% <96.55%> (ø)
airflow/contrib/operators/ssh_operator.py 83.75% <0%> (+1.25%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a1f9d9a...b1fc817. Read the comment docs.

@akki
Copy link
Contributor Author

akki commented Jul 9, 2019

Just forced push to trigger Travis again - it's green now!

Would anyone like to review this now and let me know their feedback please?

@mik-laj
Copy link
Member

mik-laj commented Jul 14, 2019

I looked at this code and the _execute method seems to me unreadable. It seems to me that it is worth creating two private methods, pull_images, run_images. WDYT?

@akki
Copy link
Contributor Author

akki commented Jul 15, 2019

@mik-laj
I see your point regarding _execute. I think _execute is actually doing the things a run_image method should be doing and hence it makes sense to rename it to something like _run_image.

Regarding pull image part of the code, I think it is fairly short and straightforward to remain inside execute (I'll add a comment to it though). Also, this is something already there currently in Airflow and not something I have introduced in this PR, so not sure if I should do this in this ticket. If it still makes sense to you maybe it can be done in another PR/commit but personally, I don't see anything wrong with it.

Thanks for reviewing, much appreciated! :)

@akki
Copy link
Contributor Author

akki commented Jul 24, 2019

@mik-laj I went ahead and updated the PR to incorporate your suggestions as explained in my last comment.
I would appreciate if you could please have a look at it and see if this looks good to you now or if it still requires changes?

@mik-laj mik-laj self-requested a review July 24, 2019 11:24
@mik-laj
Copy link
Member

mik-laj commented Aug 12, 2019

@akki
I tested this operator. I was able to run DAG correctly when I configured the TLS correctly.

    t1 = DockerSwarmOperator(
        api_version='auto',
        docker_url='tcp://GCP-IP:2376',
        tls_ca_cert="/Users/XXX/.docker/machine/machines/swarm-manager-1/ca.pem",
        tls_client_cert="/Users/XXX/.docker/machine/certs/cert.pem",
        tls_client_key="/Users/XXX/.docker/machine/certs/key.pem",
        command='/bin/sleep 10',
        image='centos:latest',
        auto_remove=True,
        task_id='sleep_with_swarm',
    )
[2019-08-12 17:50:39,941] {__init__.py:51} INFO - Using executor SequentialExecutor
[2019-08-12 17:50:40,500] {dagbag.py:86} INFO - Filling up the DagBag from /Users/kamilbregula/airflow/dags
[2019-08-12 17:50:45,949] {taskinstance.py:614} INFO - Dependencies all met for <TaskInstance: docker_swarm_sample.sleep_with_swarm 2018-01-01T00:00:00+00:00 [None]>
[2019-08-12 17:50:45,956] {taskinstance.py:614} INFO - Dependencies all met for <TaskInstance: docker_swarm_sample.sleep_with_swarm 2018-01-01T00:00:00+00:00 [None]>
[2019-08-12 17:50:45,956] {taskinstance.py:832} INFO - 
--------------------------------------------------------------------------------
[2019-08-12 17:50:45,957] {taskinstance.py:833} INFO - Starting attempt 1 of 1
[2019-08-12 17:50:45,957] {taskinstance.py:834} INFO - 
--------------------------------------------------------------------------------
[2019-08-12 17:50:45,957] {taskinstance.py:853} INFO - Executing <Task(DockerSwarmOperator): sleep_with_swarm> on 2018-01-01T00:00:00+00:00
[2019-08-12 17:50:51,107] {docker_operator.py:257} INFO - Pulling docker image centos:latest
[2019-08-12 17:50:52,445] {docker_operator.py:261} INFO - Pulling from library/centos
[2019-08-12 17:50:52,752] {docker_operator.py:261} INFO - Pulling fs layer
[2019-08-12 17:50:53,264] {docker_operator.py:261} INFO - Downloading
[2019-08-12 17:50:53,327] {docker_operator.py:261} INFO - Downloading
[2019-08-12 17:50:53,451] {docker_operator.py:261} INFO - Downloading
[2019-08-12 17:50:53,552] {docker_operator.py:261} INFO - Downloading
[2019-08-12 17:50:53,664] {docker_operator.py:261} INFO - Downloading
[2019-08-12 17:50:53,709] {docker_operator.py:261} INFO - Verifying Checksum
[2019-08-12 17:50:53,710] {docker_operator.py:261} INFO - Download complete
[2019-08-12 17:50:53,834] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:53,951] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:54,051] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:54,171] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:54,289] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:54,410] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:54,519] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:54,639] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:54,786] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:54,915] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:55,039] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:55,141] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:55,273] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:55,392] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:55,514] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:55,624] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:55,742] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:55,856] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:55,973] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:56,091] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:56,213] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:56,345] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:56,495] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:56,602] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:56,733] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:56,837] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:56,951] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:57,061] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:57,171] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:57,275] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:57,380] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:57,487] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:57,607] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:57,764] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:57,943] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:58,089] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:58,379] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:58,452] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:58,563] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:58,684] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:58,998] {docker_operator.py:261} INFO - Extracting
[2019-08-12 17:50:59,090] {docker_operator.py:261} INFO - Pull complete
[2019-08-12 17:50:59,091] {docker_operator.py:261} INFO - Digest: sha256:a799dd8a2ded4a83484bbae769d97655392b3f86533ceb7dd96bbac929809f3c
[2019-08-12 17:50:59,092] {docker_operator.py:261} INFO - Status: Downloaded newer image for centos:latest
[2019-08-12 17:50:59,092] {docker_swarm_operator.py:106} INFO - Starting docker service from image centos:latest
[2019-08-12 17:50:59,159] {docker_swarm_operator.py:123} INFO - Service started: {'ID': '3vji4mflkfsyavqb2spl2vauv'}
[2019-08-12 17:51:14,973] {docker_swarm_operator.py:135} INFO - Service status before exiting: complete

@akki
Copy link
Contributor Author

akki commented Aug 13, 2019

@mik-laj Thank you for reviewing the PR in so much detail and approval.

As far as I understand, I need another approval to move this PR ahead. @potiuk since you already had a look earlier and I addressed your requested changes, can you please have another look at it?

@mik-laj
Copy link
Member

mik-laj commented Aug 13, 2019

No additional approvals are needed.

@mik-laj mik-laj merged commit 3e2a027 into apache:master Aug 13, 2019
@acordiner
Copy link

This is great! Any chance to add the ability to pass extra arguments to the TaskTemplate and ContainerSpec? For example, it would be handy to be able to specify labels and placement constraints.

@akki
Copy link
Contributor Author

akki commented Aug 15, 2019

@acordiner thanks. If the feature is supported by the Python Docker client API, it should be pretty straightforward to add it.

@akki
Copy link
Contributor Author

akki commented Aug 15, 2019

@acordiner
That's great. Would you like to list down your thoughts/ideas as a ticket in JIRA. Airflow accepts PRs/commits only being tracked on their JIRA.

@akki akki deleted the upstream-swarm-operator branch October 30, 2019 17:14
@hredestig
Copy link

@akki Do you have an idea of how to get logs from the tasks to show up in the webinterface with this? In only get start and stop but output from the task itself is not captured..

[2019-11-04 16:27:30,337] {{docker_swarm_operator.py:125}} INFO - Service started: {'ID': 'mxjn03sm32kfm8bcczs4qw6tu'}
[2019-11-04 16:27:32,909] {{docker_swarm_operator.py:136}} INFO - Service status before exiting: complete
[2019-11-04 16:27:37,298] {{logging_mixin.py:95}} INFO - [�[34m2019-11-04 16:27:37,298�[0m] {{�[34mlocal_task_job.py:�[0m105}} INFO�[0m - Task exited with return code 0�[0m

@akki
Copy link
Contributor Author

akki commented Nov 5, 2019

@hredestig I wrote a patch for that a few weeks ago but couldn't send it back upstream. Thanks for pointing out.
I'll try to create a JIRA ticket and send a PR for it in a few days - or if you want to create the JIRA ticket, please feel free to do it here.

@hredestig
Copy link

Thanks, sure, made AIRFLOW-5850

@akki
Copy link
Contributor Author

akki commented Nov 12, 2019

@hredestig You can find the PR here - #6552. Please see if it looks good to you.

Also, for anyone wondering what this feature/commit is about, I wrote a little bit here - https://medium.com/analytics-vidhya/orchestrating-airflow-tasks-with-docker-swarm-69b5fb2723a7.

ashb pushed a commit to ashb/airflow that referenced this pull request Dec 18, 2019
…pache#5489)

* [AIRFLOW-4843] Allow orchestration via Docker Swarm (SwarmOperator)

Add support for running Docker containers via Docker Swarm
which allows the task to run on any machine (node) which
is a part of your Swarm cluster

More details: https://issues.apache.org/jira/browse/AIRFLOW-4843

Built with <3 at Agoda!

(cherry picked from commit 3e2a027)
ashb added a commit to ashb/airflow that referenced this pull request Dec 18, 2019
ashb added a commit to ashb/airflow that referenced this pull request Dec 18, 2019
ashb pushed a commit that referenced this pull request Dec 18, 2019
…5489)

* [AIRFLOW-4843] Allow orchestration via Docker Swarm (SwarmOperator)

Add support for running Docker containers via Docker Swarm
which allows the task to run on any machine (node) which
is a part of your Swarm cluster

More details: https://issues.apache.org/jira/browse/AIRFLOW-4843

Built with <3 at Agoda!

(cherry picked from commit 3e2a027)
ashb pushed a commit that referenced this pull request Dec 19, 2019
…5489)

* [AIRFLOW-4843] Allow orchestration via Docker Swarm (SwarmOperator)

Add support for running Docker containers via Docker Swarm
which allows the task to run on any machine (node) which
is a part of your Swarm cluster

More details: https://issues.apache.org/jira/browse/AIRFLOW-4843

Built with <3 at Agoda!

(cherry picked from commit 3e2a027)
kaxil pushed a commit that referenced this pull request Dec 19, 2019
…5489)

* [AIRFLOW-4843] Allow orchestration via Docker Swarm (SwarmOperator)

Add support for running Docker containers via Docker Swarm
which allows the task to run on any machine (node) which
is a part of your Swarm cluster

More details: https://issues.apache.org/jira/browse/AIRFLOW-4843

Built with <3 at Agoda!

(cherry picked from commit 3e2a027)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants