kedro-datasets 1.5.2 breaks `pip install kedro-datasets[all]` #306

noklam · 2023-08-15T16:39:26Z

Description

Short description of the problem here.
doing pip install kedro-dataset[all]==1.5.2 will get these warnings.

WARNING: kedro-datasets 1.5.2 does not provide the extra ‘api-apidataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘biosequence-biosequencedataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘dask-parquetdataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘databricks-managedtabledataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘geopandas-geojsondataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘holoviews-holoviewswriter’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘matplotlib-matplotlibwriter’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘networkx-networkxdataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘pickle-pickledataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘pillow-imagedataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘plotly-jsondataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘plotly-plotlydataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘polars-csvdataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘redis-pickledataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘snowflake-snowparktabledataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘spark-deltatabledataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘spark-sparkdataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘spark-sparkhivedataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘spark-sparkjdbcdataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘svmlight-svmlightdataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘tensorflow-tensorflowmodeldataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘video-videodataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘yaml-yamldataset’

Context

How has this bug affected you? What were you trying to accomplish?

Steps to Reproduce

[First Step]
[Second Step]
[And so on...]

Expected Result

Tell us what should happen.

Actual Result

Tell us what happens instead.

-- If you received an error, place it here.

-- Separate them if you have more than one.

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

Kedro version used (pip show kedro or kedro -V):
Kedro plugin and kedro plugin version used (pip show kedro-airflow):
Python version used (python -V):
Operating system and version:

The text was updated successfully, but these errors were encountered:

astrojuanlu · 2023-08-16T07:21:52Z

The root cause of these problems is that kedro-datasets extras names just doesn't align with Python packaging standards: https://packaging.python.org/en/latest/specifications/core-metadata/#provides-extra-multiple-use

Provides-Extra (multiple use)
Changed in version 2.3: PEP 685 restricted valid values to be unambiguous (i.e. no normalization required).
...
A string containing the name of an optional feature. A valid name consists only of lowercase ASCII letters, ASCII numbers, and hyphen.

Therefore, pip install kedro-datasets[pandas.CSVDataSet] is out of line with current packaging standards.

From https://peps.python.org/pep-0685/:

When comparing extra names, tools MUST normalize the names being compared using the semantics outlined in PEP 503 for names:
re.sub(r"[-_.]+", "-", name).lower()
The core metadata specification will be updated such that the allowed names for Provides-Extra matches what PEP 508 specifies for names.

So, why was this working before? See also in the PEP:

Tools generating metadata MUST raise an error if a user specified two or more extra names which would normalize to the same name.

The reason is: setuptools hasn't fully implemented PEP 685 yet: pypa/setuptools#3586

So, our mangling of extras in setup.py before #263 worked, but after transitioning to self-referential extras, we fell victims of name normalization.

I'm voting against switching to Poetry, Hatch, PDM, or any other system that allows this behavior, because it's going to bite us in the future. We need a short-term solution (either revert #263 or go for @DimedS #307) and a long term solution (possibly deprecating extras names with dots and offer an alternative syntax for our users).

DimedS · 2023-08-16T08:03:44Z

The root cause of these problems is that kedro-datasets extras names just doesn't align with Python packaging standards: https://packaging.python.org/en/latest/specifications/core-metadata/#provides-extra-multiple-use

Provides-Extra (multiple use)
Changed in version 2.3: PEP 685 restricted valid values to be unambiguous (i.e. no normalization required).
...
A string containing the name of an optional feature. A valid name consists only of lowercase ASCII letters, ASCII numbers, and hyphen.

Therefore, pip install kedro-datasets[pandas.CSVDataSet] is out of line with current packaging standards.

From https://peps.python.org/pep-0685/:

When comparing extra names, tools MUST normalize the names being compared using the semantics outlined in PEP 503 for names:
re.sub(r"[-_.]+", "-", name).lower()
The core metadata specification will be updated such that the allowed names for Provides-Extra matches what PEP 508 specifies for names.

So, why was this working before? See also in the PEP:

Tools generating metadata MUST raise an error if a user specified two or more extra names which would normalize to the same name.

The reason is: setuptools hasn't fully implemented PEP 685 yet: pypa/setuptools#3586

So, our mangling of extras in setup.py before #263 worked, but after transitioning to self-referential extras, we fell victims of name normalization.

I'm voting against switching to Poetry, Hatch, PDM, or any other system that allows this behavior, because it's going to bite us in the future. We need a short-term solution (either revert #263 or go for @DimedS #307) and a long term solution (possibly deprecating extras names with dots and offer an alternative syntax for our users).

Good points. I completely agree that the central issue is naming, and we need to maintain consistency with the standards going forward. Changing names would be a breaking release. Should we include this in the 0.19 scope? Additionally, I concur that transitioning to Poetry or other alternatives should be a long-term strategic decision, with a thorough analysis of its pros and cons.

merelcht · 2023-08-16T08:49:14Z

The naming change would just be for kedro-datasets right? We could just release 2.0.0 for datasets with that breaking change.

DimedS · 2023-08-16T09:26:14Z

The naming change would just be for kedro-datasets right? We could just release 2.0.0 for datasets with that breaking change.

In any case, we need a new release. Should we aim for 2.0.0 soon? Alternatively, would a temporary solution with version 1.5.3 be better?

noklam · 2023-08-16T09:44:00Z

I am in favor of reverting to setup.py now. I think a short term solution is enough. This isn't a functional change. i.e. It does not add new function but merely refactoring.

I feel too rush to release a breaking change now, as we are adding Python 3.11 support.

If we end up decide rolling with the naming standard, I think we should do it with 0.19 or wait for some bigger changes. Honestly I think the largest user of module level alias is ourselves, most of the starters and our user go with specific dataset.

astrojuanlu · 2023-08-16T16:18:00Z

We decided to revert 👍🏽 That's basically reverting #263 and adding back any extras that were added to pyproject.toml since it was merged to the resulting setup.py.

noklam mentioned this issue Aug 15, 2023

Kedro release 0.18.13 - official support for Python 3.11 kedro-org/kedro#2919

Closed

18 tasks

DimedS linked a pull request Aug 15, 2023 that will close this issue

fix(datasets): Add optional dependencies duplication in pyproject.toml #307

Closed

4 tasks

astrojuanlu mentioned this issue Aug 16, 2023

fix(datasets): Add optional dependencies duplication in pyproject.toml #307

Closed

4 tasks

DimedS self-assigned this Aug 16, 2023

This was referenced Aug 16, 2023

Make starter tests use non-deprecated dataset kedro-org/kedro#2933

Closed

Replace "DataSet" with "Dataset" in Markdown files kedro-org/kedro#2735

Merged

DimedS linked a pull request Aug 17, 2023 that will close this issue

fix(datasets): Revert the optional dependencies back to setup.py #310

Merged

4 tasks

astrojuanlu closed this as completed in #310 Aug 17, 2023

astrojuanlu mentioned this issue Aug 18, 2023

[Spike] Decide new name for kedro-datasets optional dependencies #313

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kedro-datasets 1.5.2 breaks `pip install kedro-datasets[all]` #306

kedro-datasets 1.5.2 breaks `pip install kedro-datasets[all]` #306

noklam commented Aug 15, 2023

astrojuanlu commented Aug 16, 2023

DimedS commented Aug 16, 2023

merelcht commented Aug 16, 2023

DimedS commented Aug 16, 2023

noklam commented Aug 16, 2023

astrojuanlu commented Aug 16, 2023

kedro-datasets 1.5.2 breaks pip install kedro-datasets[all] #306

kedro-datasets 1.5.2 breaks pip install kedro-datasets[all] #306

Comments

noklam commented Aug 15, 2023

Description

Context

Steps to Reproduce

Expected Result

Actual Result

Your Environment

astrojuanlu commented Aug 16, 2023

DimedS commented Aug 16, 2023

merelcht commented Aug 16, 2023

DimedS commented Aug 16, 2023

noklam commented Aug 16, 2023

astrojuanlu commented Aug 16, 2023

kedro-datasets 1.5.2 breaks `pip install kedro-datasets[all]` #306

kedro-datasets 1.5.2 breaks `pip install kedro-datasets[all]` #306