Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix downstream recursion #2181

Merged
merged 1 commit into from
Oct 13, 2022
Merged

Conversation

pawel-big-lebowski
Copy link
Collaborator

Signed-off-by: Pawel Leszczynski [email protected]

Problem

Traversing lineage graph with upstream and downstream edges leads to cycles and the same node being added to recursive table multiple times.

Solution

Apply solution described here: https://www.postgresql.org/docs/current/queries-with.html
similar to one below:

WITH RECURSIVE search_graph(id, link, data, depth, is_cycle, path) AS (
    SELECT g.id, g.link, g.data, 0,
      false,
      ARRAY[g.id]
    FROM graph g
  UNION ALL
    SELECT g.id, g.link, g.data, sg.depth + 1,
      g.id = ANY(path),
      path || g.id
    FROM graph g, search_graph sg
    WHERE g.id = sg.link AND NOT is_cycle
)
SELECT * FROM search_graph;

The same can be achieved with CYCLE id SET is_cycle USING path that is not available in Postgresql 12.

Note: All database schema changes require discussion. Please link the issue for context.

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've updated the CHANGELOG.md with details about your change under the "Unreleased" section (if relevant, depending on the change, this may not be necessary)
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

@boring-cyborg boring-cyborg bot added the api API layer changes label Oct 12, 2022
@codecov
Copy link

codecov bot commented Oct 12, 2022

Codecov Report

Merging #2181 (3794134) into main (7b6265d) will increase coverage by 11.23%.
The diff coverage is 100.00%.

@@              Coverage Diff              @@
##               main    #2181       +/-   ##
=============================================
+ Coverage     65.31%   76.54%   +11.23%     
- Complexity      193     1108      +915     
=============================================
  Files            35      214      +179     
  Lines           816     5181     +4365     
  Branches         90      408      +318     
=============================================
+ Hits            533     3966     +3433     
- Misses          154      760      +606     
- Partials        129      455      +326     
Impacted Files Coverage Δ
api/src/main/java/marquez/db/ColumnLineageDao.java 100.00% <ø> (ø)
...ain/java/marquez/service/ColumnLineageService.java 97.19% <100.00%> (ø)
api/src/main/java/marquez/service/JobMetrics.java 91.66% <0.00%> (ø)
api/src/main/java/marquez/db/JobVersionDao.java 91.04% <0.00%> (ø)
...main/java/marquez/common/models/NamespaceName.java 80.00% <0.00%> (ø)
...ain/java/marquez/service/models/StreamVersion.java 75.00% <0.00%> (ø)
...ain/java/marquez/tracing/TracingServletFilter.java 0.00% <0.00%> (ø)
...c/main/java/marquez/service/models/SourceMeta.java 100.00% <0.00%> (ø)
...a/marquez/api/exceptions/RunNotFoundException.java 100.00% <0.00%> (ø)
...c/main/java/marquez/tracing/SentryPropagating.java 56.75% <0.00%> (ø)
... and 171 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Signed-off-by: Pawel Leszczynski <[email protected]>
@pawel-big-lebowski pawel-big-lebowski merged commit 2d80f4c into main Oct 13, 2022
@pawel-big-lebowski pawel-big-lebowski deleted the column-lineage-fix-downstream branch October 13, 2022 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants