Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PERF] Lazily import heavy modules to speed up import times #2826

Merged
merged 41 commits into from
Sep 19, 2024

Conversation

desmondcheongzx
Copy link
Contributor

@desmondcheongzx desmondcheongzx commented Sep 10, 2024

Introduce lazy imports for heavy modules that are not needed as top-level imports. For example, ray does not need to be a top level import (it should only be imported when using the ray runner or when specific ray data extension types needed. Another example would be UnityCatalogTable, which is a relatively heavy import despite only being needed when using delta lake.

Modules to import lazily were determined by the proportion of import time as shown by importtime-output-wrapper -c 'import daft' --format waterfall --depth 25.

The list of newly lazily imported modules are:

  • daft.unity_catalog
  • fsspec
  • numpy
  • pandas
  • PIL.Image
  • pyarrow
  • pyarrow.csv
  • pyarrow.dataset
  • pyarrow.fs
  • pyarrow.json
  • pyarrow.parquet
  • ray
  • ray.data.extensions
  • xml.etree.ElementTree

Uses #2836 in order to defer the import of pyarrow.

Additionally, we move all type-checking-only module imports into type checking blocks.

With these changes, import times go from roughly 0.6-0.7s to ~0.045s (~13-15x faster).

Copy link

codspeed-hq bot commented Sep 10, 2024

CodSpeed Performance Report

Merging #2826 will degrade performances by 42.95%

Comparing desmondcheongzx:add-lazy-import (f888e49) with main (dba931f)

Summary

⚡ 2 improvements
❌ 1 regressions
✅ 13 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main desmondcheongzx:add-lazy-import Change
test_count[1 Small File] 13.6 ms 23.9 ms -42.95%
test_explain[100 Small Files] 82 ms 33.2 ms ×2.5
test_show[100 Small Files] 318 ms 56.3 ms ×5.6

Copy link
Member

@samster25 samster25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me!! Nice work.

I was also wondering if we can put in some lint rules as a follow on that will prohibit folks in the future from raw import our list of "bad" deps

It seems that ruff does support this
astral-sh/ruff#2656

logger = logging.getLogger(__name__)
fsspec = LazyImport("fsspec")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow on, we should look into if we can just depreciate fsspec. I'm not sure if we need it as a default dep anymore

Any thoughts @jaychia?

daft/series.py Outdated Show resolved Hide resolved
@desmondcheongzx desmondcheongzx force-pushed the add-lazy-import branch 2 times, most recently from 7daa1d4 to 734a44d Compare September 16, 2024 23:37
Copy link
Member

@samster25 samster25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

daft/runners/ray_runner.py Outdated Show resolved Hide resolved
Copy link
Member

@samster25 samster25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@desmondcheongzx desmondcheongzx enabled auto-merge (squash) September 19, 2024 20:37
@desmondcheongzx desmondcheongzx merged commit 78a92a2 into Eventual-Inc:main Sep 19, 2024
35 checks passed
@desmondcheongzx desmondcheongzx deleted the add-lazy-import branch September 19, 2024 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants