Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] PyArrow version is out of date #29503

Closed
spolcyn opened this issue Oct 20, 2022 · 6 comments
Closed

[Core] PyArrow version is out of date #29503

spolcyn opened this issue Oct 20, 2022 · 6 comments
Assignees
Labels
bug Something that is supposed to be working; but isn't data Ray Data-related issues P1 Issue that should be fixed within a few weeks

Comments

@spolcyn
Copy link
Contributor

spolcyn commented Oct 20, 2022

What happened + What you expected to happen

The current PyArrow version is locked at <7, which prevents us from using Ray and a modern PyArrow (e.g., v9) version at the same time: https://github.com/ray-project/ray/blob/master/python/requirements.txt#L61.

We'd expect to be able to use the latest version with Ray.

Versions / Dependencies

Ray 2.0.0

Reproduction script

Attempt to build a conda env with ray[all] and pyarrow=9.0.0

Issue Severity

Low: It annoys or frustrates me.

@spolcyn spolcyn added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Oct 20, 2022
@clarkzinzow
Copy link
Contributor

clarkzinzow commented Oct 20, 2022

Hi @spolcyn, thank you for opening this issue! This upper-bound on Arrow is due to a bug in pickling Arrow data that would make using Arrow 7+ in Ray untenable. I actually have a PR out adding support for Arrow 7+ (fixing this Arrow serialization issue with our own workaround), up to the latest stable release (Arrow 9). We expect this to land for Ray 2.2.

Are there any details you can share about why you require support for Arrow 9? E.g. is there an Arrow feature/bug fix that you need, or do you have a dependency that requires a recent Arrow version?

@spolcyn
Copy link
Contributor Author

spolcyn commented Oct 21, 2022

@clarkzinzow Great, glad to see that!

At this moment, we don't have any hard requirements for Arrow 7+ -- primary goal is to make sure the ball is already rolling on the PyArrow support so we don't end up in a situation where there is a hard requirement (from us, from another package, etc.).

@cadedaniel cadedaniel added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Oct 21, 2022
@thatcort
Copy link

I just hit this pyarrow version problem when I tried to use pyarrow.Table.to_pylist() and discovered it doesn't exist.

@clarkzinzow clarkzinzow self-assigned this Nov 18, 2022
@clarkzinzow clarkzinzow added air data Ray Data-related issues labels Nov 18, 2022
@clarkzinzow clarkzinzow added this to the Arrow 7+ Support milestone Nov 18, 2022
@clarkzinzow
Copy link
Contributor

@thatcort @spolcyn Just to note, support for Arrow 7 through Arrow 10 (and Arrow nightly) has been added in Ray nightly, and will be released in Ray 2.2! See this milestone for all issues involved with this effort, and the below individual issues and sub-issues for each bug addressed.

@spolcyn
Copy link
Contributor Author

spolcyn commented Nov 18, 2022

Thanks for making this happen! Looking forward to Ray 2.2

@thatcort
Copy link

Hooray! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't data Ray Data-related issues P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

No branches or pull requests

4 participants