-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More control over folder structure #2553
Comments
I would like to piggyback on this issue as we are trying to achieve a more "organized" project structure by separating the
Customizing the paths for the project structure, such as moving the |
At the moment the only hardcoded paths are One solution to remove those paths would be to use entry points so that Kedro projects advertise where their # pyproject.toml
[project.entry-points."spaceflights.kedro"]
register_pipelines = "spaceflights.pipelines.pipeline_registry:register_pipelines" # A function
settings = "spaceflights.settings" # A module And then In [1]: from importlib.metadata import entry_points
In [2]: kedro_eps = entry_points(group="spaceflights.kedro")
In [3]: kedro_eps["register_pipelines"].load()
Out[3]: <function spaceflights.pipelines.pipeline_registry.register_pipelines()>
In [4]: kedro_eps["settings"].load()
Out[4]: <module 'spaceflights.settings' from '/private/tmp/test-eps/src/spaceflights/settings.py'> (crazy idea, all names are bikesheddable, and probably there are implications I didn't consider) Another idea would be to write those directly in the Another idea could be to designate a way to say in And I don't think there are more possibilities, unless I'm missing something. |
@notniknot From what I see your pipelines structure should works already, is the only problem you cannot move
@fmfreeze You mentioned two different points. |
@noklam Yes, moving the |
To communicate this better, the only fix point is @notniknot @fmfreeze Is this something that is still desired? Implementation side this is quite simple and I don't think it will break anything. We already have def bootstrap_project(project_path: str | Path) -> ProjectMetadata:
"""Run setup required at the beginning of the workflow
when running in project mode, and return project metadata.
"""
project_path = Path(project_path).expanduser().resolve()
metadata = _get_project_metadata(project_path)
_add_src_to_path(metadata.source_dir, project_path)
configure_project(metadata.package_name)
return metadata |
Closing this as there hasn't been a lot of demand for it. We'll document the minimal requirements for Kedro project structure in #2512. |
On slack channel, @astrojuanlu, @deepyaman and I discussed the possibility to configure kedro so it knows about and works with a custom folder structure.
E.g. the
src
folder in a kedro repo simply has a different name.That figured out to be as easy as adding a
source_dir = "my_name"
to thepyproject.toml
file, so kedro successfully runs with that structure:But it would be even better to have more config control over the folder structure to let kedro work with e.g. such one, with
kedro_src
folder bundlingsettings.py
andpipeline_registry.py
(and maybe even__main__.py
?):But at the moment those paths are hardcoded:
kedro/kedro/framework/project/__init__.py
Lines 256 to 259 in 2e70dec
Wouldn't that be great? :)
But why would it be great?
In Organisations there are often already established cookiecutter templates for their whatever data project. Kedro would be easier to integrate into those.
In a scientific environment, as a data-engineer I try to help making the scientists focus on actual tasks, and not (boilerplate) infrastructure. The easier a tool like kedro can be integrated, the more it avoids confusion among our scientists (which are no SW Devs - and don't have to be :)
Minor edits by @astrojuanlu
The text was updated successfully, but these errors were encountered: