Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On mostly-duplicate git urls, pick whichever came first (#1084) #1428

Merged
merged 2 commits into from
May 9, 2019

Conversation

beckjake
Copy link
Contributor

@beckjake beckjake commented Apr 29, 2019

Fixes #1084

The first encountered of duplicate URLs (the same URL with .git or without .git are considered duplicates) is used as the official URL.

@drewbanin drewbanin self-requested a review April 29, 2019 21:54
@drewbanin
Copy link
Contributor

I checked out your comment here and I think I follow your stated approaches.

I'm happy with the approach presented here, but I also think there could be merit to resolving this problem only when a conflict occurs. In the issue, you noted:

Or if it should at all - maybe we should have just recognized immediately that they were the same package name and allowed one to "win"?

This actually sounds like the best possible option because we:

  1. would only do something different than what the user specified when we had no choice
  2. would have confidence that both foo and foo.git are valid, as presumably human beings decided to commit them both into packages.yml files

What's involved in a pick-a-winner approach?

@beckjake
Copy link
Contributor Author

The pick-a-winner approach is a bit more complicated - I'll modify ProjectList to change this bit to check if the incorporated package is a GItPackage, and if so do some additional checks.

First write wins, on the assumption that all matching URLs are valid
@beckjake beckjake force-pushed the fix/strip-url-trailing-dotgit branch from 590c310 to 3af88b0 Compare April 30, 2019 16:13
@beckjake beckjake changed the title strip any trailing .git from git package URLs (#1084) On mostly-duplicate git urls, pick whichever came first (#1084) Apr 30, 2019
@beckjake beckjake marked this pull request as ready for review May 1, 2019 13:09
@@ -216,9 +216,17 @@ class GitPackage(Package):

def __init__(self, *args, **kwargs):
super(GitPackage, self).__init__(*args, **kwargs)
# strip trailing '.git's from urls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# strip trailing '.git's from urls


def __getitem__(self, key):
if isinstance(key, basestring):
return super(PackageListing, self).__getitem__(key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When i use the following package spec, I see a backtrace on this line:

packages:
  - git: "https://github.com/fishtown-analytics/dbt-utils"
    revision: 0.1.12

  - git: "https://github.com/fishtown-analytics/dbt-event-logging.git"
    revision: 0.1.5
2019-05-07 21:48:28,778 (MainThread): Traceback (most recent call last):
  File "/Users/drew/fishtown/dbt/core/dbt/main.py", line 80, in main
    results, succeeded = handle_and_check(args)
  File "/Users/drew/fishtown/dbt/core/dbt/main.py", line 153, in handle_and_check
    task, res = run_from_args(parsed)
  File "/Users/drew/fishtown/dbt/core/dbt/main.py", line 201, in run_from_args
    results = task.run()
  File "/Users/drew/fishtown/dbt/core/dbt/task/deps.py", line 525, in run
    final_deps[name].resolve_version()
  File "/Users/drew/fishtown/dbt/core/dbt/task/deps.py", line 397, in __getitem__
    return super(PackageListing, self).__getitem__(key)
KeyError: 'https://github.com/fishtown-analytics/dbt-utils.git'

I don't totally understand what's happening here, or why this case is different than the example provided in the integration test in this PR. Can you check it out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like while evaluating subpackages, dbt was using the chosen canonical key of the package instead of the package itself for the collision check, which bypassed the extra other_name handling. That's no longer valid since the two dicts might have distinct keys that should match!

remove incorrect comment
fix an issue where we looked deps up by the wrong key
add a test
@beckjake beckjake force-pushed the fix/strip-url-trailing-dotgit branch from 905bdbc to 12f0887 Compare May 8, 2019 16:24
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is working great after that last commit.

What do you make of that test failure? Is that just a transient CI issue? LGTM when you feel good about whatever happened there :)

@beckjake
Copy link
Contributor Author

beckjake commented May 8, 2019

I've seen this a couple times now. It's a transient CI-only thing (I can never reproduce it locally) - the thread that is supposed to send the kill seems to not manage to do so in the 15s it takes the pg_sleep(15) to run. I've seen behavior like this on resource-starved VMs and cloud CI services before. I assume the thread calling kill just never gets scheduled until the query is done. Re-running the test always seems to fix it.

Maybe we could take the lazy way out and decorate it with this pytest plugin?

@drewbanin
Copy link
Contributor

Flaky looks cool! not something to solve in this PR though.... I just kicked off the tests again - let's merge when they're all 🍏 📗 💚 🥗 🇬🇱

@beckjake beckjake merged commit 3a7f931 into dev/wilt-chamberlain May 9, 2019
@beckjake beckjake deleted the fix/strip-url-trailing-dotgit branch May 9, 2019 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect package duplicate error when using/missing .git extension
2 participants