Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow Dataset Query Updates #2534

Merged
merged 4 commits into from
Jul 17, 2023
Merged

Slow Dataset Query Updates #2534

merged 4 commits into from
Jul 17, 2023

Conversation

phixMe
Copy link
Member

@phixMe phixMe commented Jun 30, 2023

Problem

These queries were slow for Marquez instances with many datasets, dataset versions, and facets.

Solution

These queries were slow for Marquez instances with many datasets, dataset versions, and facets.

One-line summary: Scopes down nested facet queries to be the same scope as the outer query.

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've included a one-line summary of your change for the CHANGELOG.md (Depending on the change, this may not be necessary).
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

@boring-cyborg boring-cyborg bot added the api API layer changes label Jun 30, 2023
@codecov
Copy link

codecov bot commented Jun 30, 2023

Codecov Report

Merging #2534 (15aef67) into main (e99ebc9) will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##               main    #2534   +/-   ##
=========================================
  Coverage     83.86%   83.86%           
  Complexity     1245     1245           
=========================================
  Files           238      238           
  Lines          5657     5657           
  Branches        271      271           
=========================================
  Hits           4744     4744           
  Misses          769      769           
  Partials        144      144           
Impacted Files Coverage Δ
api/src/main/java/marquez/db/DatasetDao.java 98.64% <ø> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@wslulciuc wslulciuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phixMe, great work on identifying ways to optimize our dataset query! But, mind adding a query plan, or analysis of the query before and after? I agree with the changes, but also think an analysis would be helpful to better understand the change.

Copy link
Member

@wslulciuc wslulciuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed offline, these changes look great! Thanks for the perf improvements 💯

@phixMe phixMe merged commit 52b70a7 into main Jul 17, 2023
@phixMe phixMe deleted the update/datasets-sql branch July 17, 2023 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api API layer changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants