[DataCatalog]: Lazy dataset loading #3935

ElenaKhaustova · 2024-06-06T12:05:16Z

Description

Users are required to install all dependencies even for unused datasets, leading to unnecessary complexity and confusion.

We propose implementing a lazy dataset loading feature to allow users to load only the datasets they need without causing pipeline failures.

Relates to #2829

Context

"You need to install all dependencies even for unused datasets (in case you want to run pipeline partially or do not load some dataset when standalone catalog usage)."
"We have a lot of data entries and different dependencies and when we just want to rerun an anaysis partially, we are frustrated because we need to install all the packages to just load one data source. Why would I need to install excel dependencies to instantiate the DataCatalog to load a csv which does not need Excel?"
The error users get now in case of missing dependencies is unclear [DataCatalog]: Error message is confusing when kedro-dataset is not installed #3911

DatasetError: An exception occurred when parsing config for dataset 'companies':
No module named 'pandas'. Please see the documentation on how to install relevant dependencies for kedro_datasets.pandas.CSVDataset:

The text was updated successfully, but these errors were encountered:

ElenaKhaustova added the Issue: Feature Request New feature or improvement to existing feature label Jun 6, 2024

ElenaKhaustova added this to the Redesign the API for IO (catalog) milestone Jun 6, 2024

ElenaKhaustova mentioned this issue Jun 6, 2024

Research summary of insights for redesigning Kedro's data catalog API #3934

Open

github-actions bot mentioned this issue Jul 1, 2024

Monthly issue metrics report #3975

Open

ElenaKhaustova mentioned this issue Aug 8, 2024

Design DataCatalog2.0 #3995

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DataCatalog]: Lazy dataset loading #3935

[DataCatalog]: Lazy dataset loading #3935

ElenaKhaustova commented Jun 6, 2024

[DataCatalog]: Lazy dataset loading #3935

[DataCatalog]: Lazy dataset loading #3935

Comments

ElenaKhaustova commented Jun 6, 2024

Description

Context