Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataCatalog2.0]: Draft of AbstractDataCatalog and KedroDataCatalog (work in progress) #4070

Closed
wants to merge 26 commits into from

Conversation

ElenaKhaustova
Copy link
Contributor

@ElenaKhaustova ElenaKhaustova commented Aug 6, 2024

Description

Solves #3925, #3926, #3916

Development notes

This PR includes a draft of the following:

  1. Implement draft of AbstractDataCatalog and KedroDataCatalog(AbstractDataCatalog)
  • AbstractDataCatalog supports instantiation from configuration and/or datasets
  • AbstractDataCatalog stores the configuration provided
  1. Rework dataset pattern resolution logic:
  • Pattern resolution logic moved out from _get_dataset() to resolve_patterns()
  • Pattern resolution logic split into actual resolution and updating datasets/configurations
  • _dataset_patterns and _default_patterns now obtained from config at the __init__
  • Added resolved_ds_configs property to store resolved datasets' configurations at the catalog level
  • add() method adds or replaces the dataset and its configuration
  • add_feed_dict() renamed to add_from_dict()
  • introduces _runtime_patterns catalog field to keep the logic of processing dataset/default/runtime patterns clear
  • removed shallow_copy() method used to add extra_dataset_patterns at runtime, replaced it with dedicated - add_runtime_patterns() method
  1. Rework dataset access logic
  • Removed _FrozenDatasets and access datasets as properties
  • Add get dataset by name feature: dedicated function and access by key
  • Added iterate over the datasets feature
  • We still do not allow to modify dataset property but allow add(replace=True)
  1. Make KedroDataCatalog mutable:
  • We do not want to make datasets property public not to encourage behaviour when users configure the catalog via modifying the datasets dictionary
  • _datasets property remained protected, but public datasets property was added, returning a deep copy of _datasets while the setter is still not allowed; the same is applied to the _resolved_ds_configs property
  • One can still extend and replace _datasets via the catalog.add() method
  1. To make AbstractDataCatalog compatible with the current runners' implementation several methods - release(), confirm() and exists() were kept as the part of interface. But they only have a meaningful implementation for KedroDataCatalog

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

@ElenaKhaustova ElenaKhaustova changed the title [DataCatalog2.0]: Refactor dataset factory resolution logic (work in progress) [DataCatalog2.0]: Draft of AbstractDataCatalog and KedroDataCatalog (work in progress) Aug 7, 2024
@ElenaKhaustova ElenaKhaustova changed the base branch from main to 3995-data-catalog-2.0 August 8, 2024 09:05
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
@ElenaKhaustova ElenaKhaustova mentioned this pull request Aug 8, 2024
3 tasks
@ElenaKhaustova
Copy link
Contributor Author

Replaces with #4123

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant