-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PERF] Lazily import heavy modules to speed up import times #2826
[PERF] Lazily import heavy modules to speed up import times #2826
Conversation
CodSpeed Performance ReportMerging #2826 will degrade performances by 42.95%Comparing Summary
Benchmarks breakdown
|
d9cd2d6
to
2ab3770
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me!! Nice work.
I was also wondering if we can put in some lint rules as a follow on that will prohibit folks in the future from raw import our list of "bad" deps
It seems that ruff does support this
astral-sh/ruff#2656
daft/filesystem.py
Outdated
logger = logging.getLogger(__name__) | ||
fsspec = LazyImport("fsspec") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow on, we should look into if we can just depreciate fsspec. I'm not sure if we need it as a default dep anymore
Any thoughts @jaychia?
7daa1d4
to
734a44d
Compare
734a44d
to
701b4ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Introduce lazy imports for heavy modules that are not needed as top-level imports. For example,
ray
does not need to be a top level import (it should only be imported when using the ray runner or when specific ray data extension types needed. Another example would beUnityCatalogTable
, which is a relatively heavy import despite only being needed when using delta lake.Modules to import lazily were determined by the proportion of import time as shown by
importtime-output-wrapper -c 'import daft' --format waterfall --depth 25
.The list of newly lazily imported modules are:
daft.unity_catalog
fsspec
numpy
pandas
PIL.Image
pyarrow
pyarrow.csv
pyarrow.dataset
pyarrow.fs
pyarrow.json
pyarrow.parquet
ray
ray.data.extensions
xml.etree.ElementTree
Uses #2836 in order to defer the import of
pyarrow
.Additionally, we move all type-checking-only module imports into type checking blocks.
With these changes, import times go from roughly 0.6-0.7s to ~0.045s (~13-15x faster).