Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide new name for "reused pipelines with namespaces" #4016

Open
astrojuanlu opened this issue Jul 17, 2024 · 2 comments
Open

Decide new name for "reused pipelines with namespaces" #4016

astrojuanlu opened this issue Jul 17, 2024 · 2 comments
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@astrojuanlu
Copy link
Member

Description

Come up with a name for

pipeline(
    base_data_science, 
    namespace = "ds_2",
    parameters={"params:model_options": "params:model_options_2"},
    inputs={"model_input_table"},
)

(taken from https://docs.kedro.org/en/latest/nodes_and_pipelines/namespaces.html#what-is-a-namespace)

This is a child task of #2723. After coming up with the right name, we should adjust our documentation and training materials.

Context

People often refer to the above as "modular pipelines", even though we have already established that this is an abuse of terminology #2723 (in fact, one can have pipelines with namespaces that aren't modular, and modular pipelines that don't use namespaces)

In #3948 we reworked the docs and we already got signals from users that they discovered namespaces thanks to it! https://linen-slack.kedro.org/t/22686809/this-is-exciting-that-s-all-https-docs-kedro-org-en-latest-n#c2f45225-8c17-4153-97b7-ae974a779c27

This shows the importance of properly naming things so that they can be described and taught properly.

Also notice that users can reuse pipelines without specifying namespaces, as demonstrated in the first code snippet of https://docs.kedro.org/en/latest/nodes_and_pipelines/namespaces.html

    return pipeline(
        existing_pipeline, # Name of the existing Pipeline object
        inputs = {"old_input_df_name" : "new_input_df_name"},  # Mapping existing Pipeline input to new input
        outputs = {"old_output_df_name" : "new_output_df_name"},  # Mapping existing Pipeline output to new output
        parameters = {"params: model_options": "params: new_model_options"},  # Updating parameters
    )

The docs clarify that doing this is kind of useless though, because "In Kedro, you cannot run pipelines with the same node names". So this "pipeline inheritance" (?) plus the concept of namespaces is what enables actual pipeline reuse.

Possible Implementation

  • "Nested pipelines" (inheritance + namespace = nesting, and it also makes it visually clear what happens)

Possible Alternatives

  • "Namespaced pipelines"
  • "Namespace pipelines"
  • "Sub pipelines"
  • "Reused pipelines"
  • Just "namespaces" (although I consider this an abuse of terminology too)
@astrojuanlu astrojuanlu added the Issue: Feature Request New feature or improvement to existing feature label Jul 17, 2024
@DimedS
Copy link
Contributor

DimedS commented Jul 17, 2024

Thanks for the issue, @astrojuanlu. I agree that we should continue to clarify in our docs how to better use namespaces. For that, we should take into account what @idanov mentioned in the recent demo: namespaces are not only a way to reuse pipelines (as clarified in the recent docs update) but also a way to better structure pipelines. This aspect is currently not well covered but is valuable for users in terms of visualisation and deployment. Therefore, it's a good point that we currently have a dedicated page for namespaces. I believe the page name should focus on how namespaces help reuse and better structure pipelines.

@yury-fedotov
Copy link
Contributor

At least how I see this, all below is just my mental model.

I like to think that there are 3 fundamental archetypes of Pipeline objects I'm creating. They all are instances of Pipeline but obtained very differently:

  • Abstract templates. These are never registred themselves, never point to actual catalog entries, and only serve to be used with pipeline() wrapper.
  • Namespaced instances of abstract templates. Those are results of applying pipeline() wrapper on abstract templates. They are registred, and whole idea is to reuse abstract template but point it to actual catalog entries.
  • Nonmodular pipelines. Good example is this. Those point to specific catalog entries right away (in node definitions) and aren't intended to leverage namespaces. They operate at root namespace of the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
Status: No status
Development

No branches or pull requests

3 participants