-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf/improve jobdao query #2609
Perf/improve jobdao query #2609
Conversation
Signed-off-by: Abdallah Terrab <[email protected]>
Signed-off-by: Abdallah Terrab <[email protected]>
Thanks for opening your first pull request in the Marquez project! Please check out our contributing guidelines (https://github.com/MarquezProject/marquez/blob/main/CONTRIBUTING.md). |
Codecov Report
@@ Coverage Diff @@
## main #2609 +/- ##
=========================================
Coverage 83.31% 83.31%
Complexity 1289 1289
=========================================
Files 243 243
Lines 5940 5940
Branches 280 280
=========================================
Hits 4949 4949
Misses 844 844
Partials 147 147
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The queries are equivalent and new one is proven to be faster, so the PR deserves an approval 👍 🔢 🥇
I thought that CTE
s do just work in a way as copy pasting the syntax to make it more readable, while not affecting the query plan and performance. In this case, it looks like it allowed limit
before the joins. Great finding.
@algorithmy1 First-class work. Thank you.
Thank you for the compliment 🥰 By the way we can improve the query without using |
Great job! Congrats on your first merged pull request in the Marquez project! |
Problem
Hello,
While working with Marquez, I noticed a significant performance bottleneck with a specific SQL query in the
JobDao.java
file. For the namespaceName "MyNameSpace", the query was originally taking 17 seconds to execute with a limit of 100, and 12 seconds with a limit of 25. Given that this query runs every time the Marquez web UI is accessed, this presented a major user experience challenge.db.t4g.medium (vCPU: 2, RAM: 4 GB)
See : #2608
Solution
To address this, I've revised the query. The optimized query makes use of Common Table Expressions to fetch the required data more efficiently and before the join. Here's the optimized query:
On the same cluster
db.t4g.medium (vCPU: 2, RAM: 4 GB)
, the optimization reduced the execution time from 17 seconds withlimit=100
to a mere 300ms. Forlimit=25
, it dropped from 12 seconds to under 100ms.Furthermore, I believe there's potential for even more optimization. If
job_facets_view
included the columnnamespace_name
, it might allow for further refinements.One-line summary: Optimized a critical SQL query in
JobDao.java
, resulting in a significant reduction in execution time.Checklist
CHANGELOG.md
(Depending on the change, this may not be necessary)..sql
database schema migration according to Flyway's naming convention (if relevant).