Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataCatalog]: Lazy dataset loading #3935

Open
ElenaKhaustova opened this issue Jun 6, 2024 · 0 comments
Open

[DataCatalog]: Lazy dataset loading #3935

ElenaKhaustova opened this issue Jun 6, 2024 · 0 comments
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@ElenaKhaustova
Copy link
Contributor

Description

Users are required to install all dependencies even for unused datasets, leading to unnecessary complexity and confusion.

We propose implementing a lazy dataset loading feature to allow users to load only the datasets they need without causing pipeline failures.

Relates to #2829

Context

  • "You need to install all dependencies even for unused datasets (in case you want to run pipeline partially or do not load some dataset when standalone catalog usage)."
  • "We have a lot of data entries and different dependencies and when we just want to rerun an anaysis partially, we are frustrated because we need to install all the packages to just load one data source. Why would I need to install excel dependencies to instantiate the DataCatalog to load a csv which does not need Excel?"
  • The error users get now in case of missing dependencies is unclear [DataCatalog]: Error message is confusing when kedro-dataset is not installed #3911
DatasetError: An exception occurred when parsing config for dataset 'companies':
No module named 'pandas'. Please see the documentation on how to install relevant dependencies for kedro_datasets.pandas.CSVDataset:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
Status: No status
Development

No branches or pull requests

1 participant