Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate simplifying the api hierarchy #2441

Closed
AhdraMeraliQB opened this issue Mar 20, 2023 · 5 comments
Closed

Investigate simplifying the api hierarchy #2441

AhdraMeraliQB opened this issue Mar 20, 2023 · 5 comments

Comments

@AhdraMeraliQB
Copy link
Contributor

AhdraMeraliQB commented Mar 20, 2023

Description

#712 was discussed in Tech design, resulting in several points of discussion and some unaddressed questions. Before any progress can be made, these will have to be addressed.

The outcome of this issue should be a plan on how to proceed with #712.

The key comments will be copied over for reference.

@AhdraMeraliQB
Copy link
Contributor Author

AhdraMeraliQB commented Mar 20, 2023

@/merelcht:

Notes from Technical Design session 20/7:

Discussion
This issue was discussed during a Technical Design session and the following points where brought up:

The idea of moving our imports to higher level or even top level for the core Kedro components, sounds like a good idea, but it depends on a couple of things:

  • If you do from kedro import ... , would it import everything under this path? This wouldn't be desirable so we need to investigate if this is the case.

  • Investigate which other packages do top-level imports and how they handle it.
    Instead of moving imports all the way to top-level we could have something like a "shortcuts" or "library" section, which is what Django is doing.

  • Why is Django doing it this way?

  • How deep are the imports? Is it worth changing it? E.g. if an import currently is from kedro.pipelines import Pipeline making it from kedro.shortcuts import Pipeline wouldn't improve anything.

  • If we decide to move components to higher level imports which components would that be and which components from Kedro would we want to be importable from the top level? One good candidate for change is the line magic imports. These are very long and could become from kedro.ipython

  • Is this just for the interactive workflow, (IDE can autocomplete), if so can we put these in ipython extension?

@AhdraMeraliQB
Copy link
Contributor Author

@/noklam

If you do from kedro import ... , would it import everything under this path? This wouldn't be desirable so we need to investigate if this is the case.

I am pretty sure this is the case and it's just how Python works, and there isn't much we can do here.

For example if you have a init.py which just has 1 line.

import kedro.xxx.yyy.zzz
Everytime you do import kedro, it will also import kedro.xxx and kedro.xxx.yyy and kedro.xxx.yyy.zzz.
Equally when people do from xxx.yyy.zzz import something, it doesn't save any memory magically as the top-level library is imported, it just import it into a local namespace and optionally have an alias.

Direct Import
image

From import
image

@merelcht
Copy link
Member

Work with @amandakys on this issue.

@noklam
Copy link
Contributor

noklam commented Mar 27, 2023

Vomiting some initial thoughts (based on what I remember):

  • A lot of libraries have a top-level api module in the package that defines the public API (e.g. from kedro.api import pipeline).

  • Some larger libraries take this further by having api modules for each submodule. Then, the higher-level APIs can import * from each of the submodules.

    • Personal preference: I like when the imports from each submodule are explicit, instead of using import *, because that makes it really difficult to track where some things are actually defined IMO.
  • There are also libraries like https://pypi.org/project/atpublic/ to keep __all__ more maintainable, though I haven't used it (and honestly only seen it used in one place). from - @deepyaman

Overall the only question in my mind here is what exactly the structure should be:

  1. kedro.node, kedro.pipeline as per the initial suggestion
  2. kedro.something.node, kedro.something.pipeline as per @limdauto's idea that imports from the kedro library would contrast with imports from kedro.framework.project (maybe kedro.library.node, kedro.library.pipeline? I don't like that though) - from @AntonyMilneQB

If I summarize it correctly from the long thread. Some of the questions are already answered in the parent issue #712
There are few options here

  1. kedro.node , kedro.pipeline - top level namespace
  2. kedro.something.node, kedro.something.pipeline - One kedro submodule potentially kedro.api module that expose all public level API
  3. kedro.sub_module1.api, kedro.sub_module2.api - submodule level API

(1) Maybe the easiest for a user to discover, but I am highly against it because it pollutes the kedro namespace. In short it will creates dependencies for all kedro submodules - see #2441 (comment)
(2) I like this option the best so far - compromised and relatively easy to remember.
(3) Overhead is large - Kedro public API isn't that large and I don't feel a strong need for sub-module level API alias.

Independently of the options above

  • We should use __all__ anyway to avoid users importing stuff we don't want.
  • This new API should be only user-facing. Any code within the code base should keep the import path.
from kedro.framework.pipeline import Pipeline  # This is correct
from kedro.api import Pipeline  # We shouldn't do this in our codebase because it creates the "every module depends on each other" problem

@merelcht
Copy link
Member

Closing this as well. Like @astrojuanlu mentioned on #712 this is not something we'll be addressing at the moment. If you feel like we should redesign our API, please leave a comment.

@merelcht merelcht closed this as not planned Won't fix, can't repro, duplicate, stale Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants