Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: will PyArrow+pandas be made optional for backends? #10120

Closed
1 task done
MarcoGorelli opened this issue Sep 13, 2024 · 6 comments
Closed
1 task done

feat: will PyArrow+pandas be made optional for backends? #10120

MarcoGorelli opened this issue Sep 13, 2024 · 6 comments
Labels
feature Features or general enhancements

Comments

@MarcoGorelli
Copy link

MarcoGorelli commented Sep 13, 2024

Is your feature request related to a problem?

As far as I understand, pandas+pyarrow are now optional for pip install ibis-framework, but still required for all backends

What is the motivation behind your request?

There was a request recently in Narwhals that I thought Ibis might be better suited for, but the poster responded with

def don't want anything to do with pyarrow as a dependency 😁

Describe the solution you'd like

Would you consider making PyArrow / pandas optional for backends?

What version of ibis are you running?

9.5.0

What backend(s) are you using, if any?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@MarcoGorelli MarcoGorelli added the feature Features or general enhancements label Sep 13, 2024
@lostmygithubaccount
Copy link
Member

lostmygithubaccount commented Sep 13, 2024

asked over there for the rationale -- one of the engineers can weigh in but my understanding is it's still a good amount of work with fairly minimal benefit for users. the main reason cited in the past has been running in AWS Lambda and other FaaS, but you can very easily use PyArrow or other larger dependencies in those tools (i.e. I don't think this was ever a particularly valid reason, so would be great to understand this person's perspective)

@MarcoGorelli
Copy link
Author

thanks for your response!

just for my understanding - supposing it were possible, would you be open to such a PR?

@lostmygithubaccount
Copy link
Member

lostmygithubaccount commented Sep 16, 2024

I personally don't see why we wouldn't. I think given infinite time and resources, this is definitely something we would do -- Phillip already made it possible as you note without a backend. of course, we'd want to ensure no functionality is lost. it'd be good to have the engineers weigh in (we'll discuss this at some point this week and can respond back here if they don't already from the GH notifications)

@kylebarron
Copy link

kylebarron commented Sep 16, 2024

FWIW I'm also interested in using ibis without requiring pyarrow as a dependency. I don't have anything against pyarrow personally, but it's a very big dependency to force on all users of a library (see the pandas v3 discussion) and with the Arrow PyCapsule Interface it's now a lot easier to use alternative, smaller Python Arrow implementations, like nanoarrow or my own.

If substrait is now maturing, then any backend that can consume substrait (e.g. at least DuckDB) could in theory remove the pyarrow dependency pretty easily?

@lostmygithubaccount
Copy link
Member

lostmygithubaccount commented Sep 16, 2024

If substrait is now maturing, then any backend that can consume substrait (e.g. at least DuckDB) could in theory remove the pyarrow dependency pretty easily?

I don't think these things are related -- the long-term vision is substrait as intermediary representation (and Ibis can already produce Substrait plans), but I wouldn't expect Ibis to "switch" anytime soon for a bunch of technical/data system adoption reasons (e.g. DuckDB's Substrait consumption tends to be far more buggy than SQL)

not that it's hard to find but link to the pandas discussion for context: pandas-dev/pandas#57073

nanoarrow (or arro3) does seem like an interesting option but we're beyond my technical depth 😄

@gforsyth
Copy link
Member

I'm closing this out in favor of #10166 -- TLDR; we're interested in making sure that our usage of pandas and pyarrow are cleanly separable from other backend functionality, but we aren't (in the short-term) going to remove pyarrow as a dependency because we don't want new users to have to install multiple extras to have a functional ibis installation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
Status: done
Development

No branches or pull requests

4 participants