Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support python 3.11 #2270

Closed
sbrugman opened this issue Jan 31, 2023 · 18 comments
Closed

Support python 3.11 #2270

sbrugman opened this issue Jan 31, 2023 · 18 comments
Assignees
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@sbrugman
Copy link
Contributor

sbrugman commented Jan 31, 2023

Are there any changes that need to be made, or is it as simple as updating the package information?

depends on #2742

@sbrugman sbrugman added the Issue: Feature Request New feature or improvement to existing feature label Jan 31, 2023
@merelcht
Copy link
Member

merelcht commented Feb 2, 2023

That's a good question @sbrugman. I don't know on top of my head how much needs to be changed, but we can definitely add this to our backlog and start the work on it.

@merelcht
Copy link
Member

The first step is to add a CI build with 3.11 and see if that just works, both on kedro and kedro-plugins. If it turns out to be more involved, we should re-estimate or create a follow up ticket.

@noklam noklam self-assigned this Mar 20, 2023
@noklam
Copy link
Contributor

noklam commented Mar 23, 2023

Summary
See details in CircleCI Logs

Going to support this will probably take some effort, I would guess an 8 or more. I stopped here just to timebox the task. I would like to get some comments to see if I need to continue. @merelcht @astrojuanlu

Further questions

  1. Do we ever drop DataSet supports?
    Every dataset added is making CI more complex. In reality, no one really uses all of the datasets, but our CI has to take care of the combinations of all dependencies. This confuses pip as some dependencies are old and it hits weird combination, sometimes it just stop CI to run completely and it's difficult to figure out which package is the root cause due to 2nd or even 3rd level dependencies i.e. C depends on B depends on A, and CI complain about A. I can only follow a trial & error pattern and bump one library at a time.

  2. How do we handle release for kedro-datasets when some libraries are not ready?
    Is this considered a breaking change and we will bump the major version immediately? Semantic Versioning. It also add more complexity to CI, we will need to skip certain test to run in a particular Python version. Kedro-dataset release process kedro-plugins#405

  3. Is there a way to handle (1) better?
    It would be great if the team can share some wisdom, or how to approach this problem systematically. I haven't find a better way to make small changes incrementally. It could be just a quick discussion, if there is no better way it's fine too.

  4. How do we handle versions conflict?
    holoviews is also pinned at some 2020 versions for a while, it is conflicting with the newer matplotlib version, as a result we have this pin in our test_requirements.txt
    Is this the right thing to do? We aren't testing the newer version, so in theory matplotlib could break with version >3.6 but we will not find out in our test suite Check if improvements are needed for test_requirements.txt setup #1498 @dee

matplotlib>=3.0.3, <3.4; python_version < '3.10' # 3.4.0 breaks holoviews
matplotlib>=3.5, <3.6; python_version == '3.10'
"matplotlib.MatplotlibWriter": ["matplotlib>=3.0.3, <4.0"]

@merelcht
Copy link
Member

  • Tensorflow isn't ready for python3.11 - so we can't add Python3.11 until 0.19.

It looks like they added support for python 3.11 yesterday 👀 https://github.com/tensorflow/tensorflow/releases/tag/v2.12.0

@astrojuanlu
Copy link
Member

astrojuanlu commented Mar 24, 2023

Generally speaking I think it's a pity we can't declare Kedro 3.11-compatible because 1 specific dataset is not - this way we are bound to be as conservative as the most conservative of our dependencies. And I completely agree that (1) is a problem for development, at the moment all the dependencies are needed to install the docs and it's a bit of a pain.

I'm thinking if there is a way to decouple the Kedro CI from kedro-org/kedro-plugins#405, but I don't see an easy path. We'd need to selectively install portions of kedro-datasets depending on the Python version (not the end of the world I suppose, but a bit of CI complexity) and mark our tests so that we can conditionally disable them for certain Python versions.

Rehashing what I said in kedro-org/kedro-plugins#405, I'm wondering if it's time to unbundle kedro-datasets a bit, because it's growing quite a lot (I think about this every time I have to install biopython or some other domain-specific packages). Or at least, move some of the problematic datasets that have huge or niche dependencies to separate packages. That would also make community contributions easier I think.

@merelcht
Copy link
Member

Summary See details in CircleCI Logs

Going to support this will probably take some effort, I would guess an 8 or more. I stopped here just to timebox the task. I would like to get some comments to see if I need to continue. @merelcht @astrojuanlu

I'm fine with pausing for now and creating a new ticket for the actual implementation for another spring. If the only things that break are datasets, could we just ignore these tests for 3.11? I guess we might need to also try the 3.11 for kedro-datasets to see if we can easily support it? Ideally we'd get 3.11 on Kedro in a non-breaking release, but for kedro-datasets it's fine to do a breaking release IMO.

Further questions

  1. Do we ever drop DataSet supports?
    Every dataset added is making CI more complex. In reality, no one really uses all of the datasets, but our CI has to take care of the combinations of all dependencies. This confuses pip as some dependencies are old and it hits weird combination, sometimes it just stop CI to run completely and it's difficult to figure out which package is the root cause due to 2nd or even 3rd level dependencies i.e. C depends on B depends on A, and CI complain about A. I can only follow a trial & error pattern and bump one library at a time.

I don't think we should drop support for datasets, but we might need to revisit how we run our datasets tests. Maybe with another setup (tox or something) we could just run every dataset class test separately?

  1. How do we handle release for kedro-datasets when some libraries are not ready?
    Is this considered a breaking change and we will bump the major version immediately? Semantic Versioning. It also add more complexity to CI, we will need to skip certain test to run in a particular Python version. Kedro-dataset release process #2433

Do we know yet what is required on datasets to support 3.11? I guess depending on what we need to do it might be a major bump, which would be fine with me. I don't care too much about the CI setup, we're doing it similarly on Kedro already.

  1. How do we handle versions conflict?
    holoviews is also pinned at some 2020 versions for a while, it is conflicting with the newer matplotlib version, as a result we have this pin in our test_requirements.txt

We might need to review all the dependencies in datasets and check if we can bump them. I don't think we've reviewed many of them since they were added.

@noklam
Copy link
Contributor

noklam commented Mar 27, 2023

Thank you comments on both! I'll try to address all the comments here.

  1. Further unbundle datasets
    My gut feeling is that this creates too many overheads for maintenance, do we start to release them as individual meta-packages? Initially, we briefly discussed the meta-package approach but didn't go with it. Explore how other (meta-)packages handle dependencies #1777 If we have a good idea how this could be implemented we can put Kedro-dataset release process kedro-plugins#405 intol Tech design.

  2. Kedro / kedro-datasets 3.11 support
    We can definitely release kedro-datasets 3.11 separately. My main concern is before kedro 3.11 is out it's not too meaningful to release kedro-datasets for 3.11, should we release them at the same time to make it more cohesive? Right now we cannot support 3.11 for kedro unless we solves all the datasets problem because we still have kedro.extras.datasets, so I don't see a way that we can have 3.11 support for kedro without solving all the datasets problem. @merelcht

  3. Modify the way we run tests
    In theory, running the tests separately isn't the same as installing all the dependencies (But I am not sure if is one superior to another). Imagine a case when datasets have such dependencies. If we do want to run a test like this, we should start with Check if improvements are needed for test_requirements.txt setup #1498, since there is no point to have a different test_requirements.txt anymore.

# Package A
`library_a > 1.0, <2.0

# Pacakge B 
library_a > 2.0, 3.0

Tests will run properly separately, but not when users want to use both datasets at the same time (which is something we guarantee at the moment)

  1. What are needed to support Python 3.11?

Do we know yet what is required on datasets to support 3.11? I guess depending on what we need to do it might be a major bump, which would be fine with me. I don't care too much about the CI setup, we're doing it similarly on Kedro already. @merelcht
I think currently the largest issues are tensorflow, I'll need to run a few more tests since the Python3.11 support was just added last week.

@noklam
Copy link
Contributor

noklam commented Mar 27, 2023

Python3.11 support would be added for Spark 3.4.0. At the moment it's on 3.3.2, the next candidate will be 3.4.0, no schedule yet.

Edited: https://is.spark.released.info/, spark release is coming in 25 days, so we should re-visit it in 2/3 sprints later.
apache/spark#38987

@noklam
Copy link
Contributor

noklam commented Apr 3, 2023

Suggested Plan for release:

  1. Fix all datasets issue and release kedro-datasets support for Python 3.11 - (Might need to point to the main branch instead of pypi because we won't have kedro available for Python 3.11 yet - Add Python3.11 support for kedro-datasets kedro-plugins#154
  2. Release kedro for Python 3.11 - This ticket
  3. Revert the CI to point to pypi (kedro-datasets)

@astrojuanlu
Copy link
Member

Just to recap, what is missing here?

In #2417 there was some discussion about tensorflow, but it looks like tensorflow has Python 3.11 support now (tensorflow/tensorflow#58032 closed at the beginning of April). So, I guess we should try and add 3.11 support for the kedro-datasets CI first?

@astrojuanlu
Copy link
Member

Step (1) above can't proceed unless we have a version of Kedro that bumps requires-python, otherwise it can't even be installed. I think we might need to rework the plan and release Kedro framework with support for Python 3.11 first, otherwise this might be a bit cumbersome. Wondering if we could do that before 0.19, I get asked about this very often.

@noklam
Copy link
Contributor

noklam commented Jun 20, 2023

@astrojuanlu I agree kedro need to be done before kedro-datasets because kedro-datasets depends on kedro instead of the other way round.

I don't see any problem unless we introduce a breaking change, there was comment that should we delay the Python3.11 support for 0.19.0. But I don't see this tagged as 0.19.0 so I assume we can release as soon as we finish. Cc @merelcht and @yetudada for confirmation.

@noklam noklam linked a pull request Jun 20, 2023 that will close this issue
5 tasks
@astrojuanlu
Copy link
Member

First: kedro needs to allow 3.11 before kedro-datasets can start the process. But #2417 got stuck, because our e2e tests install kedro from PyPI:

#2417 (comment)

Second: there have been long conversations about not allowing upper version caps in Requires-Python:

And third: Kedro does work with Python 3.11 already. This is what I did:

  1. Modified pyproject.toml locally to remove the upper cap:
diff --git a/pyproject.toml b/pyproject.toml
index ec06b75d..0421b726 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -10,7 +10,7 @@ authors = [
     {name = "Kedro"}
 ]
 description = "Kedro helps you build production-ready data and analytics pipelines"
-requires-python = ">=3.7, <3.11"
+requires-python = ">=3.7"
 keywords = [
     "pipelines",
     "machine learning",
  1. Built a wheel and installed it: python -m build && pip install dist/*.whl
  2. Created a new project with the spaceflights starter: kedro new --starter=spaceflights
  3. Commented out the kedro* dependencies in src/requirements.txt, and manually installed pandas pyarrow openpyxl
  4. kedro run 🏁

This worked perfectly!

So, my suggestion is that we proceed as follows:

  1. We release a Kedro version without the upper cap, and we don't ever put it again. This could be the upcoming 0.18.11. We don't tell anyone (not even in the changelog) that technically Kedro can be installed in Python 3.11.
    • Users casually trying Kedro on 3.11 may see installation issues here and there. No worries - Python 3.11 support is by no means official, yet. They will probably arrive to this issue. Not very different from what's happening now.
  2. With that version published, we unblock Attempt to bump python version to 3.11  #2417.
  3. In parallel (no need to even wait for Attempt to bump python version to 3.11  #2417) we start working on kedro-datasets, kedro-viz, kedro-telemetry, kedro-airflow.
  4. In the next release (say 0.18.12) we declare official Python 3.11 support on our changelog (maybe with some exceptions of problematic datasets that we need to skip on CI)
  5. In 0.19.0 we drop Python 3.7 (Drop support for Python 3.7 end of year 2023 #2158)

How does that sound?

@noklam
Copy link
Contributor

noklam commented Jun 28, 2023

kedro doesn’t seem to be a problem once I open the upper bound. Maybe it never install kedro from requirements.txt because of this

def _setup_minimal_env(context):
kedro_install_venv_dir = _create_new_venv()
context.kedro_install_venv_dir = kedro_install_venv_dir
context = _setup_context_with_venv(context, kedro_install_venv_dir)
call(
[
context.python,
"-m",
"pip",
"install",
"-U",
"pip>=21.2",
"setuptools>=65.5.1",
"wheel",
],
env=context.env,
)
call([context.python, "-m", "pip", "install", "."], env=context.env)

Even if we release an open bound kedro, we will be still blocked by kedro-telemetry which depends on kedro and don’t have 3.11 yet.

https://app.circleci.com/pipelines/github/kedro-org/kedro/23779/workflows/2d306aed-ed9b-498a-aec4-bb05f5429fb9/jobs/273978

When we release 0.18, we also have a failing e2e test that we bypass manually. #1399

@astrojuanlu astrojuanlu mentioned this issue Jun 28, 2023
8 tasks
astrojuanlu added a commit that referenced this issue Jul 3, 2023
astrojuanlu added a commit that referenced this issue Jul 4, 2023
astrojuanlu added a commit that referenced this issue Jul 4, 2023
astrojuanlu added a commit that referenced this issue Jul 5, 2023
astrojuanlu added a commit that referenced this issue Jul 10, 2023
astrojuanlu added a commit that referenced this issue Jul 10, 2023
astrojuanlu added a commit that referenced this issue Jul 10, 2023
astrojuanlu added a commit that referenced this issue Jul 11, 2023
* Allow any version of Python in metadata

See https://iscinumpy.dev/post/bound-version-constraints/#pinning-the-python-version-is-special,
and discussion in https://discuss.python.org/t/requires-python-upper-limits/12663?u=astrojuanlu,
as well as practical problems with leaving the version cap on in
#2270 (comment).

Signed-off-by: Juan Luis Cano Rodríguez <[email protected]>

* Add toggleable error if Python version is incompatible

Signed-off-by: Juan Luis Cano Rodríguez <[email protected]>

* temp commit

Signed-off-by: Nok <[email protected]>

* change gitpod

Signed-off-by: Nok <[email protected]>

* revert GitPod

Signed-off-by: Nok <[email protected]>

* add tests

Signed-off-by: Nok <[email protected]>

* Fix import tests

Signed-off-by: Juan Luis Cano Rodríguez <[email protected]>

---------

Signed-off-by: Juan Luis Cano Rodríguez <[email protected]>
Signed-off-by: Nok <[email protected]>
Co-authored-by: Nok <[email protected]>
@noklam
Copy link
Contributor

noklam commented Aug 2, 2023

Quick update about the Python 3.11 work. 2 remaining blockers:
Matplotlib seems getting into the way - this shows up after we move to Github Action CI, potentially build issue
@Sajid Alam
is looking at this
We need a kedro-telemetry for Python 3.11

@SajidAlamQB
Copy link
Contributor

🎉 Great news! We've merged the Python 3.11 work. Check out the PR here: #2851. We appreciate your feedback – please don't hesitate to share any thoughts or issues you come across.

@noklam
Copy link
Contributor

noklam commented Aug 8, 2023

This will be coming in 0.18.13, if you can't wait. You can also clone the repository and install from main branch.

@merelcht
Copy link
Member

merelcht commented Aug 9, 2023

Completed in #2851

@merelcht merelcht closed this as completed Aug 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants