-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large projects see significant slowdown in resolve_graph using v0.21.0 #4012
Comments
@isaacsantelli Weird! Thanks for the bug report. It looks like, once dbt starts running nodes (
Could you check the debug-level logs ( |
Attached is the log from today : dbt.log |
@isaac-taylor Thanks for the quick response. It looks like all metadata queries are running quite quickly. The weird thing I'm noticing is significant slowdown associated with firing some anonymous usage events:
I don't know the details of your development/deployment environment, and whether those requests might be passing through some sort of proxy, or if the issue is with event collection on our end. Could you try disabling anonymous usage tracking, and see if that speeds up your invocations? |
I'm experiencing the same issue and disabling user tracking doesn't help. Here's the log entry before disabling usage tracking:
Here's after:
The output of
The operating system you're using: The output of |
I set |
Logs show a gap of about 2 minutes with nothing in between
|
Also if it would be helpful info we have a DC2 redshift cluster |
@ilmari-aalto @isaacsantelli Thanks for that additional info. I've been able to reproduce the issue by compiling a project with a lot of tests:
Based on the logging, I believe the primary delay is cropping up after parsing (manifest construction), during graph construction. (I'm not sure about the second delay, between @leahwicz I'm pretty sure this is related to the new dependency linking, related to the test blocking feature for To confirm this, I added timestamped lines to the beginning + end of
This issue isn't unique to Redshift, and it's especially painful because partial parsing can't help here—users are going to feel this pain during every invocation of a runnable task. I think our options are:
|
Thanks for looking at it quickly @jtcohen6 ! I wonder what's behind the second delay, but perhaps it always was there I just didn't pay attention to it 😛 . To be clear, the slowness doesn't affect our Production runs as the delay there is not relevant, but slows down interactive, local development. I always thought our project was small to mid-size, but now I'm seeing it can be considered large 😄 |
Our project is fair sized: 0.21.0 is very slow to start after that line, even with partial parsing as well. Snowflake here. |
Describe the bug
Since upgrading to 0.21.0 I have seen a significant slowdown in the speed with which DBT runs this includes both
dbt compile
anddbt run
. I am running on a redshift data warehouse, though I don't know if the issue is redshift specific.Steps To Reproduce
Simply run
dbt run
ordbt compile
with partial parse enabled or disabled the issue occurs either way.Expected behavior
I saw a 3-5x slowdown so i expected it to run significantly faster
Screenshots and log output
System information
Which database are you using dbt with?
The output of
dbt --version
:The operating system you're using:
Mac OS Big Sur 11.3.1
The output of
python --version
:Python 3.7.5
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: