Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update insert job function to avoid joining on symlinks for jobs that have no symlinks #2144

Merged
merged 2 commits into from
Sep 27, 2022

Conversation

collado-mike
Copy link
Collaborator

Problem

Typical marquez installations don't have a large number of new jobs being created on a regular basis. However, in some small number of installations, there can be a large number of new jobs being created, which executes the rewrite_jobs_fqn_table function each time, putting stress on the backing database. Most of the compute cost of this function is in computing the symlinks and aliases for jobs - even when the inserted job has no symlink.

Closes: #ISSUE-NUMBER

Solution

Adding a check for the symlink field and offering a lower cost query in cases when no symlink is present (the norm) radically reduces the database compute load in Marquez installations that frequently create a large number of new jobs.
The following graph shows query count and latency and database CPU utilization under a test load of many new jobs being created. The test load was several days of real production OpenLineage events being replayed on a dev instance. To verify results, I ran the same test twice for both the old query and the new. Under heavy load, the job creation query causes database CPU utilization to climb to 100% and query latency climbs to as high as 2 seconds. Under the same load (I renamed all of the jobs in the database, so the same load shows up as new jobs that invoke the job creation query), the new query drives CPU utilization to around 30% and query latency is around 300 microseconds.

Note that the query latency in this graph is shown at log scale (right axis). Otherwise, the latency for the new query would be indistinguishable from 0.
Screen Shot 2022-09-26 at 8 41 11 AM

Note: All database schema changes require discussion. Please link the issue for context.

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've updated the CHANGELOG.md with details about your change under the "Unreleased" section (if relevant, depending on the change, this may not be necessary)
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

@boring-cyborg boring-cyborg bot added the api API layer changes label Sep 26, 2022
Copy link
Member

@wslulciuc wslulciuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments, otherwise thanks for the amazing write up and analysis accompanying the fix @collado-mike 💯 🥇

@codecov
Copy link

codecov bot commented Sep 27, 2022

Codecov Report

Merging #2144 (1b16062) into main (c5fc6bf) will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##               main    #2144   +/-   ##
=========================================
  Coverage     75.30%   75.30%           
  Complexity     1038     1038           
=========================================
  Files           203      203           
  Lines          4883     4883           
  Branches        399      399           
=========================================
  Hits           3677     3677           
  Misses          763      763           
  Partials        443      443           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@collado-mike collado-mike merged commit bb3d163 into main Sep 27, 2022
@collado-mike collado-mike deleted the fix/insert_job_perf branch September 27, 2022 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api API layer changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants