Narrow down kedro-datasets public API #139

astrojuanlu · 2023-03-22T16:09:16Z

Description

Right now there are more or less two ways of importing any given dataset:

>>> from kedro_datasets.biosequence import BioSequenceDataSet
>>> from kedro_datasets.biosequence.biosequence_dataset import BioSequenceDataSet as BioSequenceDataSet2
>>> BioSequenceDataSet is BioSequenceDataSet2
True

I think we should communicate to users that only the former belongs to the public API.

Context

Reasons to do this:

If we want to refactor the internals down the line but the public API is narrower, we have more leeway to rename or move things around in a non-breaking way (by leveraging the implicit convention that the private API might break at any time)
For users just skimming the codebase or using IDE autocomplete capabilities, it's more obvious how should they import the datasets.
Certain tools for automatic API documentation generation choke when there are duplicated entities Duplicates in auto-generated documents readthedocs/sphinx-autoapi#358 or need to handle this case explicitly using __all__ or similar https://sphinx-autodoc2.readthedocs.io/en/latest/quickstart.html#documenting-only-the-public-api-via-all

Possible Implementation

Renaming the appropriate submodules to prepend an underscore. For example, kedro_datasets/biosequence/biosequence_dataset.py would become kedro_datasets/biosequence/_biosequence_dataset.py.

Possible Alternatives

Haven't considered any alternatives.

The text was updated successfully, but these errors were encountered:

antonymilne · 2023-03-24T10:07:36Z

Love this idea, and I've thought similar things in the past. Actually I think similar arguments could be made to parts of framework too (e.g. pipeline import springs to mind), where the import paths and public API are not great - see kedro-org/kedro#712. But kedro-datasets definitely seems like a good place to start. It's a breaking change but probably not that breaking in this case, because the vast majority of references to datasets will be through the catalog, in which case the canonical way to refer to them is through the higher-level type: biosequence.BioSequenceDataSet rather than the lower-level type: biosequence.biosequence_dataset.BioSequenceDataSet (which would still work at the moment I think, and I suspect some people do it that way unintentionally).

The one thing that's put me off this in the past is that the only way I could think of doing it is as you suggest, by prepending underscores, which I just find looks a bit weird even though. But it's Python convention I know, so that's not a strong argument against it... Also I actually don't know if there's a problem using autocomplete imports at the moment? In PyCharm if I try to automatically import BioSequenceDataSet it already does it as the higher-level from kedro-datasets.biosequence import BioSequenceDataSet and doesn't give the option of the lower-level import. I don't know what mechanism it uses to work that out or how it works in other IDEs though. e.g. are there IDEs which don't currently work that way but would change to working the way we want if we added the _?

astrojuanlu · 2023-03-24T15:20:28Z

I think the IDE might work this out by seeing the __all__ attribute in __init__.py - which is excellent, but still potentially someone could consider importing from the submodule.

To me the underscore convention looks natural, but that's a matter of taste I guess 😄

deepyaman · 2023-04-24T13:55:26Z

Technically a breaking change, but no objections (so far), so need to migrate to the core Kedro repo and add to the 0.19 milestone.

astrojuanlu · 2023-08-31T16:47:31Z

Does this have to be on 0.19 though? Since kedro-datasets is its own package, I was thinking that maybe this would belong to kedro-datasets 2.0.

astrojuanlu added the enhancement New feature or request label Mar 22, 2023

astrojuanlu mentioned this issue Mar 24, 2023

Initial Sphinx revamp kedro-org/kedro#2459

Merged

5 tasks

merelcht added this to the Improvements to datasets as a whole milestone Jan 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Narrow down kedro-datasets public API #139

Narrow down kedro-datasets public API #139

astrojuanlu commented Mar 22, 2023

antonymilne commented Mar 24, 2023 •

edited

Loading

astrojuanlu commented Mar 24, 2023

deepyaman commented Apr 24, 2023 •

edited

Loading

astrojuanlu commented Aug 31, 2023

Narrow down kedro-datasets public API #139

Narrow down kedro-datasets public API #139

Comments

astrojuanlu commented Mar 22, 2023

Description

Context

Possible Implementation

Possible Alternatives

antonymilne commented Mar 24, 2023 • edited Loading

astrojuanlu commented Mar 24, 2023

deepyaman commented Apr 24, 2023 • edited Loading

astrojuanlu commented Aug 31, 2023

antonymilne commented Mar 24, 2023 •

edited

Loading

deepyaman commented Apr 24, 2023 •

edited

Loading