-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve add_feed_dict
docstrings
#4009
Conversation
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
I would also suggest replacing |
kedro/io/data_catalog.py
Outdated
@@ -681,29 +681,41 @@ def add_all( | |||
self.add(name, dataset, replace) | |||
|
|||
def add_feed_dict(self, feed_dict: dict[str, Any], replace: bool = False) -> None: | |||
"""Adds instances of ``MemoryDataset``, containing the data provided | |||
through feed_dict. | |||
"""This function adds datasets to the ``DataCatalog`` using the data provided through the `feed_dict`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have a style guide for docstrings yet? In any case, following Pydocstyle D404 and D401,
"""This function adds datasets to the ``DataCatalog`` using the data provided through the `feed_dict`. | |
"""Add datasets to the ``DataCatalog`` using the data provided through the `feed_dict`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add additional ?
"""This function adds datasets to the ``DataCatalog`` using the data provided through the `feed_dict`. | |
"""Add additional datasets to the ``DataCatalog`` using the data provided through the `feed_dict`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't fully understand why we need additional
here, as we can add them to the empty catalog as well. So, it looks like we do not need to specify any adding conditions.
kedro/io/data_catalog.py
Outdated
@@ -713,6 +725,9 @@ def add_feed_dict(self, feed_dict: dict[str, Any], replace: bool = False) -> Non | |||
|
|||
self.add(dataset_name, dataset, replace) | |||
|
|||
# Alias for add_feed_dict method | |||
from_dict = add_feed_dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pedantic nitpick, but when I read from_dict
I assume it's a classmethod, like
class DataCatalog:
@classmethod
def from_dict(cls, data_dict: dict[str, AbstractDataset]):
...
return cls
So not sure if I'd add this alias.
(I know this comes from #3612 (comment) and that I asserted, but my response there is incomplete)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually quite a significant change. I think we should discuss first what a good alternative is, before just adding an alias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has always been an undiscovered part of the API, I agree with @merelcht that we should think really hard / research what a better name should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed an alias until we are sure about adding it and agree on its name.
To @astrojuanlu's point, if we decide to keep an alias, we can use the add_from_dict
name instead.
Or, as an alternative, we can add a new implementation (add_from_dict
/ add_datasets
) and use it inside add_feed_dict
with a deprecation warning.
kedro/io/data_catalog.py
Outdated
`feed_dict` dictionary key is used as a dataset name, and a value is used to create an instance of | ||
``MemoryDataset`` before adding to the ``DataCatalog`` for all the value types except of ``AbstractDataset``. | ||
In the last case, the ``AbstractDataset`` is added as it is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about
`feed_dict` dictionary key is used as a dataset name, and a value is used to create an instance of | |
``MemoryDataset`` before adding to the ``DataCatalog`` for all the value types except of ``AbstractDataset``. | |
In the last case, the ``AbstractDataset`` is added as it is. | |
`feed_dict` keys are used as dataset names, and values can either be raw data or instances of ``AbstractDataset``. | |
In the former case, instances of ``MemoryDataset`` are automatically created before adding to the ``DataCatalog``. |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't think mentioning AbstractDataset
is useful here, we shouldn't really have instances of an interface right? Does the user gain anything for understanding this implementation detail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I rephrased it, but I still think it's useful to distinguish between the two cases when adding raw data and a dataset.
kedro/io/data_catalog.py
Outdated
@@ -713,6 +725,9 @@ def add_feed_dict(self, feed_dict: dict[str, Any], replace: bool = False) -> Non | |||
|
|||
self.add(dataset_name, dataset, replace) | |||
|
|||
# Alias for add_feed_dict method | |||
from_dict = add_feed_dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually quite a significant change. I think we should discuss first what a good alternative is, before just adding an alias.
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much clearer!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! ⭐
Description
Solves #3612
Development notes
DataCatalog
'sadd_feed_dict
method and extended them with an additional example.form_dict
alias foradd_feed_dict
method and added a note to the docstring that the method will be renamed in the0.20.0
release.Developer Certificate of Origin
We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a
Signed-off-by
line in the commit message. See our wiki for guidance.If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.
Checklist
RELEASE.md
file