Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kedro-airflow: Extend grouping strategies #673

Closed
ankatiyar opened this issue May 9, 2024 · 2 comments
Closed

kedro-airflow: Extend grouping strategies #673

ankatiyar opened this issue May 9, 2024 · 2 comments

Comments

@ankatiyar
Copy link
Contributor

Description

kedro-org/kedro#3094 lists a number of pain points experienced by users while deploying their Kedro projects to MLOps platforms. Each kedro node is assigned to a task 1:1.

#241 added the --group-by-memory flag to make it possible to group nodes that share MemoryDatasets between them into one airflow task.

This ticket is to propose extending the grouping strategies offered by kedro-airflow
There's some strategies we can consider -

Suggestion

  • Change the design of --group-by-memory to something like --grouping-stratergy=<nodes/pipeline/memory>/--group-by=<> to take input. This will make it easy for us to add grouping strategies in the future depending on what users actually want/need.
  • Gather user input on what grouping strategies would be useful
@ankatiyar
Copy link
Contributor Author

Discussed briefly in Tech Design on 15/5/24:

  • Grouping by non-persistent datasets is not the ideal way to determine the grouping of nodes into tasks - generating DAGs based on the state of catalog.yml has some downsides:
    • Catalog at the time of creating DAGs does not have to be the catalog used during deployment
    • A dataset being MemoryDataset can also differ based on what configuration environment we're using
  • Recommended grouping strategy should be namespaces but it's not widely adopted.

Follow up actions:

  • Release the feature introduced in feat: kedro-airflow group in memory nodes #241 as it is now, if we decide on a different way forward, we can always make a breaking change to kedro-airflow
  • Investigate adoption and simplification to the usage of Namespaces

@ankatiyar
Copy link
Contributor Author

Based on the discussion above, closing this to focus on the adoption of namespaces on Kedro side.

@ankatiyar ankatiyar closed this as not planned Won't fix, can't repro, duplicate, stale May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant