Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: kedro-airflow DAG kwarg configuration #233

Merged
merged 8 commits into from
Aug 1, 2023

Conversation

sbrugman
Copy link
Contributor

@sbrugman sbrugman commented Jun 12, 2023

Description

Implementation for #229

Development notes

Feature implementation for configuring kedro Airflow DAGs via airflow.yml.
The arguments can be configured per pipeline.

The arguments are rendered in the template. Initially I attempted specifying variables to change directly in the yaml file. However, rendering the variables in Python requires dealing with the data types of each argument (e.g. datetime, dict, string, bool ...). This added a lot of complexity to the plugin/template. In practice, the user can take care of this simply by providing a (small) template.

In the setup, the user can update the jinja template for new arguments, then can use airflow.yml:

airflow.yml:

# Global
default:
   max_active_runs: 3
   schedule_interval: "@once"
   start_month: 1
   owner: "airflow"

# Per Pipeline
data_science:
   start_month: 2
   owner: "airflow-ds"   

Additionally, parameters can be passed with the --params CLI option.

The configuration pattern is also customizable.

Tests are present.

Checklist

  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the relevant RELEASE.md file
  • Added tests to cover my changes

@sbrugman sbrugman marked this pull request as ready for review June 13, 2023 08:40
@sbrugman
Copy link
Contributor Author

@noklam @astrojuanlu @deepyaman Could anyone please have a brief review of this PR and #241? Conceptually they are relatively small changes, so they should not take much time to review. They do enable us and other users to switch from kedro-airlfow-k8s to this official plugin!

Feel free to be critical and brief. Happy to put in the refactor work to get these features supported!

Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some high-level comments, I didn't review the details of the implementation yet.

Can you provide some steps how can I test with this new feature? I also expect there will be some unit tests or e2e test to make sure it works.

kedro-airflow/kedro_airflow/plugin.py Show resolved Hide resolved
kedro-airflow/kedro_airflow/plugin.py Outdated Show resolved Hide resolved
@sbrugman sbrugman force-pushed the kedro-airflow-config branch 2 times, most recently from 3fdc664 to b18b985 Compare June 26, 2023 15:57
@sbrugman sbrugman force-pushed the kedro-airflow-config branch 3 times, most recently from 72639bc to df20823 Compare July 11, 2023 07:33
@sbrugman sbrugman marked this pull request as draft July 12, 2023 10:59
@sbrugman sbrugman force-pushed the kedro-airflow-config branch 2 times, most recently from d29c687 to c37ffbf Compare July 12, 2023 13:09
@sbrugman sbrugman marked this pull request as ready for review July 12, 2023 13:11
@sbrugman
Copy link
Contributor Author

sbrugman commented Jul 12, 2023

@noklam Could you please have another look? The implementation is now simpler and has proper tests.

In addition to the configuration this PR also:

  • Adds help messages to the CLI arguments (contributing to Provide more helpful message for kedro airflow create -h #244)
  • Handles the case when a pipeline is specified but does not exists more gracefully
  • Adds documentation on how a user would use (native) Airflow parameters
  • Handles whitespaces in the resulting DAG better

@sbrugman sbrugman changed the title feature: kedro-airflow DAG kwarg configuration feat: kedro-airflow DAG kwarg configuration Jul 12, 2023
kedro-airflow/README.md Outdated Show resolved Hide resolved
kedro-airflow/test_requirements.txt Outdated Show resolved Hide resolved
Copy link

@luizvbo luizvbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor comments.

Copy link

@luizvbo luizvbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the plugin on a kedro starter project and the result seems correct.

@sbrugman
Copy link
Contributor Author

Thanks @luizvbo! Processed your comments.

@sbrugman sbrugman force-pushed the kedro-airflow-config branch 2 times, most recently from 1ec5169 to 002093b Compare July 12, 2023 16:55
@astrojuanlu
Copy link
Member

Happy to give this a deeper look when the linting and test failures are addressed

@sbrugman
Copy link
Contributor Author

A from __future__ import annotations import was missing, causing the lints/tests to fail on older Python versions. Fixed that.

@astrojuanlu
Copy link
Member

Nice, thanks - the last thing is some failures on Windows:

    def test_create_airflow_dag_nonexistent_pipeline(cli_runner, metadata):
        """Test executing with a non-existing pipeline"""
        command = ["airflow", "create", "--pipeline", "de"]
        result = cli_runner.invoke(commands, command, obj=metadata)
        assert result.exit_code == 1
>       assert (
            "kedro.framework.cli.utils.KedroCliError: Pipeline de not found."
            in result.stdout
        )
E       assert 'kedro.framework.cli.utils.KedroCliError: Pipeline de not found.' in ''

Copy link
Contributor

@ankatiyar ankatiyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @sbrugman! I've left some comments about it's compatibility with OmegaConfigLoader which needs to be addressed! :)

kedro-airflow/kedro_airflow/plugin.py Outdated Show resolved Hide resolved
kedro-airflow/kedro_airflow/plugin.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ankatiyar ankatiyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did another round of review and I tested it with a starter project. Some minor comments, mostly the README/Release notes. I think this is very close to the finish line, I'll add more reviewers for approval. 💯

kedro-airflow/README.md Show resolved Hide resolved
kedro-airflow/README.md Outdated Show resolved Hide resolved
kedro-airflow/RELEASE.md Outdated Show resolved Hide resolved
kedro-airflow/kedro_airflow/plugin.py Outdated Show resolved Hide resolved
kedro-airflow/README.md Outdated Show resolved Hide resolved
kedro-airflow/RELEASE.md Show resolved Hide resolved
@ankatiyar
Copy link
Contributor

@sbrugman I'm sorry, I believe committing suggestions directly from here don't get signed off making the DCO bot fail. 😅

@sbrugman sbrugman force-pushed the kedro-airflow-config branch 3 times, most recently from 0f31ce1 to e8f2d7e Compare July 14, 2023 13:45
@sbrugman
Copy link
Contributor Author

Let me know if there is anything else there is to be done on our side to help!
Quite eager to start using this feature in production.

Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbrugman Sorry for blocking this PR for so long, I have missed this notification. If you're on Slack feel free to ping me directly.

I made a commit to fix some minor thing about code style. I have tested both --params and airflow.yml, works pretty well for me.

One thing that is less intuitive to me is the airflow.yml parameters doesn't work the same as Kedro's params, as the top level key is pipeline. i.e. if you want to overide a subkey pipeline.key=3, you will do kedro airflow create -p pipeline --params key=3 instead of kedro airflow create --params pipeline.key=3. I think this is fine but just want to make a note about this.

In addition, it may be helpful to create a kedro airflow init to generate the airflow.yml with default argument at some point, I've manually test by creating this file for the sake of review.

Excellent work⭐️ We should follow up with a release after this PR.

Comment on lines +37 to +40
if "airflow" not in context.config_loader.config_patterns.keys():
context.config_loader.config_patterns.update( # pragma: no cover
{"airflow": ["airflow*", "airflow/**"]}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼 I like this.

kedro-airflow/README.md Outdated Show resolved Hide resolved
@noklam noklam removed the request for review from deepyaman July 28, 2023 09:05
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a super minor comment, but mainly came here to say thank you for this great contribution @sbrugman ! 🤩 I haven't been able to try this yet, but the code looks good and thanks for adding docs as well 👍

kedro-airflow/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@ankatiyar ankatiyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @sbrugman for this! I hope you don't mind that I made some commits to resolve conflicts and incorporate the suggestions to the docs from the reviews. Very happy about getting this merged in! 🥳 💯

@ankatiyar ankatiyar merged commit b0ca997 into kedro-org:main Aug 1, 2023
16 checks passed
@sbrugman sbrugman deleted the kedro-airflow-config branch August 9, 2023 20:23
@sbrugman
Copy link
Contributor Author

sbrugman commented Aug 9, 2023

Thank you everyone for the collaboration! This feature allows us to make an important step in our usage of scheduled Kedro nodes in many real-world pipelines 👌

Great to see that this was merged during my holidays 🌴

@noklam
Copy link
Contributor

noklam commented Aug 9, 2023

@sbrugman Thank you for your contribution and hope you have enjoyed your holiday! We have already done a release so please go ahead and use it!

PtrBld pushed a commit to PtrBld/kedro-plugins that referenced this pull request Aug 27, 2023
* feature: kedro-airflow DAG kwarg configuration

Signed-off-by: Simon Brugman <[email protected]>

* feat: `kedro-airflow` DAG kwarg configuration

Signed-off-by: Simon Brugman <[email protected]>

* Consistent use of the __future__ annotations

Signed-off-by: Nok <[email protected]>

* Apply suggestions from code review

Signed-off-by: Ankita Katiyar <[email protected]>

Co-authored-by: Merel Theisen <[email protected]>

---------

Signed-off-by: Simon Brugman <[email protected]>
Signed-off-by: Nok <[email protected]>
Co-authored-by: Ankita Katiyar <[email protected]>
Co-authored-by: Nok <[email protected]>
Co-authored-by: Ankita Katiyar <[email protected]>
Co-authored-by: Merel Theisen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Issue/PR opened by the open-source community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants