Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataCatalog]: Provide public methods to modify catalog #3930

Open
ElenaKhaustova opened this issue Jun 5, 2024 · 1 comment
Open

[DataCatalog]: Provide public methods to modify catalog #3930

ElenaKhaustova opened this issue Jun 5, 2024 · 1 comment
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@ElenaKhaustova
Copy link
Contributor

Description

Plugin developers and advanced users face limitations due to the absence of public methods for modifying the catalog datasets, and injecting dynamic behaviour or configuration parameters on the fly during pipeline execution. Although these limitations are made intentionally by not providing corresponding public APIs users bypass them by using private APIs.

We propose to:

  1. Rethink the concept of keeping DataCatalog immutable.
  2. Explore the feasibility of providing public API for modifying the catalog datasets and configuration parameters, enabling users to adapt the pipeline's behaviour in response to changing runtime requirements or environmental conditions.

Relates to #2728

Context

  • Users need the ability to view and modify information within the Data Catalog dynamically during pipeline execution. This includes injecting dynamic data or swapping dataset implementations to accommodate varying runtime requirements.

https://github.com/Galileo-Galilei/kedro-mlflow/blob/64b8e94e1dafa02d979e7753dab9b9dfd4d7341c/kedro_mlflow/framework/hooks/mlflow_hook.py#L145

Screenshot 2024-06-05 at 17 58 19

  • Plugin developers are interested in checking the dataset's type and injecting dynamic behaviour based on that type. They want to determine whether a dataset belongs to a certain class or type and then modify its parameters or behaviour accordingly, such as configuring it based on their environment or integration needs.

https://github.com/getindata/kedro-azureml/blob/d5c2011c7ed7fdc03235bf2bd6701f1901d1139c/kedro_azureml/hooks.py#L20

Screenshot 2024-06-05 at 17 37 57

@ElenaKhaustova ElenaKhaustova added the Issue: Feature Request New feature or improvement to existing feature label Jun 5, 2024
@astrojuanlu
Copy link
Member

Adding a few more examples:

There's general agreement that we don't necessarily want to make all mutations of the catalog easy (like crazy injection of datasets in the middle of the lifecycle) but maybe there's more ways we can open up the collection of datasets just before the catalog is first instantiated for the rest of the run.

For interactive use on the other hand, building the DataCatalog in an imperative way seems unnecessary and there are other possibilities we can offer #3612 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
Status: No status
Development

No branches or pull requests

2 participants