-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: make some (most) dependencies optional? #7844
Comments
Hi @MarcoGorelli 👋🏻! How did you install Ibis? Most of Ibis's dependencies are optional. The following dependencies are listed as required: atpublic = ">=2.3,<5" # __all__ handling
bidict = ">=0.22.1,<1" # `.cache()` implementation
filelock = ">=3.7.0,<4" # this should definitely be optional
multipledispatch = ">=0.6,<2" # required for various internals
numpy = ">=1,<2" # required for `execute()`
pandas = ">=1.2.5,<3" # required for `execute()`
parsy = ">=2,<3" # required for type parsing, might be possible to make optional
pins = { version = ">=0.8.3,<1", extras = ["gcs"] } # see note below
pyarrow = ">=2,<15" # required for `to_pyarrow()`
pyarrow-hotfix = ">=0.4,<1" # security patch for pyarrow
python-dateutil = ">=2.8.2,<3" # timezones
pytz = ">=2022.7" # timezones, yes there are two libraries for timezone handling because users can provide both
rich = ">=12.4.4,<14" # interactive mode table formatting
sqlglot = ">=18.12.0,<21" # general structured sql manipulation and pretty printing
toolz = ">=0.11,<1" # various python data structure utilities
# tons more below, all related to backends or optional features Notes
|
I've opened #7845 to move |
There are a few pieces of information that suggest
How are you determining that |
thanks for looking into this!
I just ran |
For DuckDB, sqlalchemy is currently required because we're using We're working on removing sqlalchemy entirely, but that's at least 2 major releases away (ibis 9.x). Any chance you can show |
ah thanks! have just got a train (w/o reliable wifi) so I can try again next week when back |
hi again - hope you had a nice xmas if you celebrated here's the requested output: $ pip list
Package Version
------------------------ ------------
aiohttp 3.9.1
aiosignal 1.3.1
appdirs 1.4.4
atpublic 4.0
attrs 23.1.0
bidict 0.22.1
cachetools 5.3.2
certifi 2023.11.17
charset-normalizer 3.3.2
decorator 5.1.1
duckdb 0.9.2
duckdb_engine 0.10.0
filelock 3.13.1
frozenlist 1.4.1
fsspec 2023.6.0
gcsfs 2023.6.0
google-api-core 2.15.0
google-auth 2.25.2
google-auth-oauthlib 1.2.0
google-cloud-core 2.4.1
google-cloud-storage 2.14.0
google-crc32c 1.5.0
google-resumable-media 2.7.0
googleapis-common-protos 1.62.0
greenlet 3.0.3
humanize 4.9.0
ibis-framework 7.2.0
idna 3.6
importlib-metadata 7.0.1
importlib-resources 6.1.1
Jinja2 3.1.2
joblib 1.3.2
markdown-it-py 3.0.0
MarkupSafe 2.1.3
mdurl 0.1.2
multidict 6.0.4
multipledispatch 1.0.0
numpy 1.26.2
oauthlib 3.2.2
pandas 2.1.4
parsy 2.1
pins 0.8.3
pip 23.2.1
protobuf 4.25.1
pyarrow 14.0.2
pyarrow-hotfix 0.6
pyasn1 0.5.1
pyasn1-modules 0.3.0
Pygments 2.17.2
python-dateutil 2.8.2
pytz 2023.3.post1
PyYAML 6.0.1
requests 2.31.0
requests-oauthlib 1.3.1
rich 13.7.0
rsa 4.9
setuptools 65.5.0
six 1.16.0
SQLAlchemy 2.0.24
sqlalchemy-views 0.3.2
sqlglot 20.4.0
toolz 0.12.0
typing_extensions 4.9.0
tzdata 2023.3
urllib3 2.1.0
xxhash 3.4.1
yarl 1.9.4
zipp 3.17.0 I'm asking about this because I'm curious to see if Currently I don't think it's feasible, I don't think a dataframe-consuming library (say, scikit-learn, skrub, hvplot, seaborn, ...there's a few 😄 ) would be willing to take on so many extra dependencies in exchange for cross-dataframe compatibility. If I understand correctly, Just wondering, though, if it could be possible to write, say, penguins.aggregate(
by="species",
total_bill_depth=penguins.bill_depth_mm.sum(),
avg_bill_length=penguins.bill_length_mm.mean(),
) and have it dispatch to the underlying library natively, but only requiring an extra lightweight dependency |
I'm 👍 on introducing an Could you elaborate on why you feel that Another idea might be to have separate packages -- In all these spellings the user needs to refer to some form of the documentation (even if just the README) to be able to install the right things -- though allowing different use-cases more control over the transitive dependencies which aren't always required. Another idea might be to have the examples include something like this: try:
import pins
except ImportError:
exit("Run `pip install ibis-framework[examples]` to use the examples") Kinda ugly in the code, but stays somewhat user-friendly if people aren't used to diagnosing |
I think it's reasonable to have a way of installing the minimum necessary dependencies needed to run Ibis code on a given backend. It is quite undesirable and annoying to be dependency-constrained by packages you'll never use. The shortest path to that seems to be having an Separating packages might be possible, but considering how much more effort that is than making I'll put up a PR to make an |
Thanks! |
Is your feature request related to a problem?
I'd like to use the ibis API to work with a pandas dataframe
Describe the solution you'd like
To be able to do just do that, without having to install:
and so many more
I may have misunderstood (in which case, sorry, apologies), but are these really required dependencies?
What version of ibis are you running?
7.2.0
What backend(s) are you using, if any?
duckdb
Code of Conduct
The text was updated successfully, but these errors were encountered: