Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kedro-datasets 1.5.2 breaks pip install kedro-datasets[all] #306

Closed
Tracked by #2919
noklam opened this issue Aug 15, 2023 · 6 comments · Fixed by #310
Closed
Tracked by #2919

kedro-datasets 1.5.2 breaks pip install kedro-datasets[all] #306

noklam opened this issue Aug 15, 2023 · 6 comments · Fixed by #310
Assignees

Comments

@noklam
Copy link
Contributor

noklam commented Aug 15, 2023

Description

Short description of the problem here.
doing pip install kedro-dataset[all]==1.5.2 will get these warnings.

WARNING: kedro-datasets 1.5.2 does not provide the extra ‘api-apidataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘biosequence-biosequencedataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘dask-parquetdataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘databricks-managedtabledataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘geopandas-geojsondataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘holoviews-holoviewswriter’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘matplotlib-matplotlibwriter’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘networkx-networkxdataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘pickle-pickledataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘pillow-imagedataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘plotly-jsondataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘plotly-plotlydataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘polars-csvdataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘redis-pickledataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘snowflake-snowparktabledataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘spark-deltatabledataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘spark-sparkdataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘spark-sparkhivedataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘spark-sparkjdbcdataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘svmlight-svmlightdataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘tensorflow-tensorflowmodeldataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘video-videodataset’
WARNING: kedro-datasets 1.5.2 does not provide the extra ‘yaml-yamldataset’

Context

How has this bug affected you? What were you trying to accomplish?

Steps to Reproduce

  1. [First Step]
  2. [Second Step]
  3. [And so on...]

Expected Result

Tell us what should happen.

Actual Result

Tell us what happens instead.

-- If you received an error, place it here.
-- Separate them if you have more than one.

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (pip show kedro or kedro -V):
  • Kedro plugin and kedro plugin version used (pip show kedro-airflow):
  • Python version used (python -V):
  • Operating system and version:
@astrojuanlu
Copy link
Member

The root cause of these problems is that kedro-datasets extras names just doesn't align with Python packaging standards: https://packaging.python.org/en/latest/specifications/core-metadata/#provides-extra-multiple-use

Provides-Extra (multiple use)
Changed in version 2.3: PEP 685 restricted valid values to be unambiguous (i.e. no normalization required).
...
A string containing the name of an optional feature. A valid name consists only of lowercase ASCII letters, ASCII numbers, and hyphen.

Therefore, pip install kedro-datasets[pandas.CSVDataSet] is out of line with current packaging standards.

From https://peps.python.org/pep-0685/:

When comparing extra names, tools MUST normalize the names being compared using the semantics outlined in PEP 503 for names:
re.sub(r"[-_.]+", "-", name).lower()
The core metadata specification will be updated such that the allowed names for Provides-Extra matches what PEP 508 specifies for names.

So, why was this working before? See also in the PEP:

Tools generating metadata MUST raise an error if a user specified two or more extra names which would normalize to the same name.

The reason is: setuptools hasn't fully implemented PEP 685 yet: pypa/setuptools#3586

So, our mangling of extras in setup.py before #263 worked, but after transitioning to self-referential extras, we fell victims of name normalization.

I'm voting against switching to Poetry, Hatch, PDM, or any other system that allows this behavior, because it's going to bite us in the future. We need a short-term solution (either revert #263 or go for @DimedS #307) and a long term solution (possibly deprecating extras names with dots and offer an alternative syntax for our users).

@DimedS
Copy link
Contributor

DimedS commented Aug 16, 2023

The root cause of these problems is that kedro-datasets extras names just doesn't align with Python packaging standards: https://packaging.python.org/en/latest/specifications/core-metadata/#provides-extra-multiple-use

Provides-Extra (multiple use)
Changed in version 2.3: PEP 685 restricted valid values to be unambiguous (i.e. no normalization required).
...
A string containing the name of an optional feature. A valid name consists only of lowercase ASCII letters, ASCII numbers, and hyphen.

Therefore, pip install kedro-datasets[pandas.CSVDataSet] is out of line with current packaging standards.

From https://peps.python.org/pep-0685/:

When comparing extra names, tools MUST normalize the names being compared using the semantics outlined in PEP 503 for names:
re.sub(r"[-_.]+", "-", name).lower()
The core metadata specification will be updated such that the allowed names for Provides-Extra matches what PEP 508 specifies for names.

So, why was this working before? See also in the PEP:

Tools generating metadata MUST raise an error if a user specified two or more extra names which would normalize to the same name.

The reason is: setuptools hasn't fully implemented PEP 685 yet: pypa/setuptools#3586

So, our mangling of extras in setup.py before #263 worked, but after transitioning to self-referential extras, we fell victims of name normalization.

I'm voting against switching to Poetry, Hatch, PDM, or any other system that allows this behavior, because it's going to bite us in the future. We need a short-term solution (either revert #263 or go for @DimedS #307) and a long term solution (possibly deprecating extras names with dots and offer an alternative syntax for our users).

Good points. I completely agree that the central issue is naming, and we need to maintain consistency with the standards going forward. Changing names would be a breaking release. Should we include this in the 0.19 scope? Additionally, I concur that transitioning to Poetry or other alternatives should be a long-term strategic decision, with a thorough analysis of its pros and cons.

@merelcht
Copy link
Member

The naming change would just be for kedro-datasets right? We could just release 2.0.0 for datasets with that breaking change.

@DimedS
Copy link
Contributor

DimedS commented Aug 16, 2023

The naming change would just be for kedro-datasets right? We could just release 2.0.0 for datasets with that breaking change.

In any case, we need a new release. Should we aim for 2.0.0 soon? Alternatively, would a temporary solution with version 1.5.3 be better?

@noklam
Copy link
Contributor Author

noklam commented Aug 16, 2023

I am in favor of reverting to setup.py now. I think a short term solution is enough. This isn't a functional change. i.e. It does not add new function but merely refactoring.

I feel too rush to release a breaking change now, as we are adding Python 3.11 support.

If we end up decide rolling with the naming standard, I think we should do it with 0.19 or wait for some bigger changes. Honestly I think the largest user of module level alias is ourselves, most of the starters and our user go with specific dataset.

@DimedS DimedS self-assigned this Aug 16, 2023
@astrojuanlu
Copy link
Member

We decided to revert 👍🏽 That's basically reverting #263 and adding back any extras that were added to pyproject.toml since it was merged to the resulting setup.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
4 participants