Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce compile time of DataFusion? #348

Closed
jorgecarleitao opened this issue May 15, 2021 · 7 comments
Closed

Reduce compile time of DataFusion? #348

jorgecarleitao opened this issue May 15, 2021 · 7 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@jorgecarleitao
Copy link
Member

During development of DataFusion I am getting a compile time that is hurting development. I am not sure if other feel the same, and I would like to gauge this here.

I admit I do not have a great machine, but I wonder if other also feel some pain in this front.

Note that this is unrelated with the dependencies; it is e.g. when changing a single physical node.

@jorgecarleitao jorgecarleitao added the enhancement New feature or request label May 15, 2021
@andygrove
Copy link
Member

FWIW, a full release build after cargo clean takes 1 min 35 seconds for me. The non-release build takes 58 seconds.

@alamb
Copy link
Contributor

alamb commented May 21, 2021

cargo test -p datafusion goes much faster for me, btw, than cargo test -- the dependency stack for ballista and some datafusion examples is distinguished

@houqp
Copy link
Member

houqp commented May 24, 2021

Being spoiled by Golang, I have to say it's relatively slow to the extend where i had to switch to do something else to wait for the build whenever I needed to test a change, even if it's just a single line.

@houqp houqp added the help wanted Extra attention is needed label Oct 18, 2021
@alamb
Copy link
Contributor

alamb commented Feb 3, 2022

I have a coworker reports that they can't work on datafusion with a mac mini with 8GB of RAM due to rust_analyzer swapping too much

It would be great to make this better somehow.

@alamb
Copy link
Contributor

alamb commented Feb 3, 2022

I ran the following to see where the time was going in my normal development loop:

cargo +nightly test -p datafusion -Z timings

And the output was instructive. Specifically it seems to take 28.6 seconds to compile the actual datafusion crate, but a whopping 65.6s to compile the tests (aka cfg(test)). I'll keep plugging away at moving test code out of the datafusion crate and into integration tests

   Completed datafusion v6.0.0 in 28.6s
   Completed datafusion v6.0.0 test "merge_fuzz" (test) in 4.6s
   Completed datafusion v6.0.0 test "order_spill_fuzz" (test) in 4.7s
   Completed datafusion v6.0.0 test "simplification" (test) in 4.8s
   Completed datafusion v6.0.0 test "dataframe" (test) in 6.1s
   Completed datafusion v6.0.0 test "provider_filter_pushdown" (test) in 6.7s
   Completed datafusion v6.0.0 test "custom_sources" (test) in 6.7s
   Completed datafusion v6.0.0 test "statistics" (test) in 6.8s
   Completed datafusion v6.0.0 test "path_partition" (test) in 6.9s
   Completed datafusion v6.0.0 test "user_defined_plan" (test) in 7.0s
   Completed datafusion v6.0.0 test "dataframe_functions" (test) in 7.4s
   Completed datafusion v6.0.0 test "parquet_pruning" (test) in 7.9s
   Completed datafusion v6.0.0 test "sql_integration" (test) in 19.2s
   Completed datafusion v6.0.0 lib (test) in 65.6s

@alamb
Copy link
Contributor

alamb commented Feb 5, 2022

@jimexist has a proposal to break datafusion into smaller crates here: #1750

@alamb
Copy link
Contributor

alamb commented Mar 9, 2024

We bave broken the main datafusion crates up substantially since this ticket was filed. I think it is no longer specifically actionable, so closing it

Not that we can't improve compile time more, of course, just let's file tickets for more specific new symptoms

@alamb alamb closed this as completed Mar 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants