Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect package duplicate error when using/missing .git extension #1084

Closed
DylanBaker opened this issue Oct 23, 2018 · 4 comments · Fixed by #1428
Closed

Incorrect package duplicate error when using/missing .git extension #1084

DylanBaker opened this issue Oct 23, 2018 · 4 comments · Fixed by #1428
Assignees
Labels
bug Something isn't working

Comments

@DylanBaker
Copy link

Incorrect package duplicate error when using/missing .git extension

Issue description

dbt seems to treat packages as not identical (but with duplicate name) if one reference has the .git file extension and another does not. This is problematic when adding a dependency that uses a package a project already includes.

Results

I had a project that was pulling the dbt-utils package. When I set up the dbt-event-logging package, despite pulling the same version/release, dbt returned the following error:

Found duplicate project dbt_utils. 
This occurs when a dependency has the same project name as some other dependency.

My packages.yml looked like:

packages:
  - git: "https://github.com/fishtown-analytics/dbt-utils"
    revision: 0.1.12

  - git: "https://github.com/fishtown-analytics/dbt-event-logging.git"
    revision: 0.1.5

The package.yml file in dbt-event-logging looks like:

packages:
  - git: "https://github.com/fishtown-analytics/dbt-utils.git"
    revision: 0.1.12

The problem was resolved by adding the .git suffix to the dbt-utils package in my project.

System information

The output of dbt --version:

installed version: 0.10.1
   latest version: 0.11.1

The operating system you're running on: Windows

The python version you're using: 3.6.0

@drewbanin drewbanin added the bug Something isn't working label Oct 23, 2018
@drewbanin
Copy link
Contributor

Thanks for the report @DylanBaker!

@drewbanin drewbanin added this to the Wilt Chamberlain milestone Nov 28, 2018
@beckjake beckjake self-assigned this Apr 29, 2019
@beckjake
Copy link
Contributor

beckjake commented Apr 29, 2019

There are a couple ways we can fix this:

  • strip the .git from all URLs passed to GitPackage. That does seem to work fine for both https:// and git:// URLs on github, though I'm not sure how correct that is in the strict protocol sense. I'd feel pretty bad if that only works for github.
  • override PackageListing's __contains__/__getitem__/__setitem__ to treat these almost-collisions as collisions.
  • either of the above, but with per-protocol handling of whether to strip or add the .git (strip on https://, add on git://?)
  • No text munging at all, but have an error if you are missing the .git or have it and should not.

Part of the issue here is that github (and probably other hosted git services?) are pretty permissive about what URLs map to the same resources! It's not clear to me how dbt should resolve problems like that. Or if it should at all - maybe we should have just recognized immediately that they were the same package name and allowed one to "win"?

@DylanBaker
Copy link
Author

DylanBaker commented Apr 30, 2019

I think the problem with no text munging at all is that you could two different packages that use different syntaxes of a shared package's URL and it wouldn't be in your control to change it unless you forked the repo.

i.e. I use Package A and Package B in my project. Package A uses dbt-utils with a trailing .git. Package B uses dbt-utils without a trailing .git. There's no easy recourse for me as the end-user to fix that (particularly for less technical users). Even if the error is more descriptive, it's probably not immediately useful.

I appreciate the scenario I'm describing is probably the minority of cases though.

@drewbanin
Copy link
Contributor

@DylanBaker check out the discussion in the PR too: #1428

I think that rather than text munging, we might be able to identify when two stated git urls point to the same repo, and then just pick one of them. The effectively does some amount of "munging" for us, but it changes the paradigm from "dbt will remove your .git suffix" to "dbt will resolve conflicts by picking a winner".

I'm just not 100% certain that every git server out there supports cloning git urls both with and without a .git suffix, and I'd prefer that we keep dbt deps as git-provider-agnostic as we can!

beckjake added a commit that referenced this issue May 9, 2019
…ng-dotgit

On mostly-duplicate git urls, pick whichever came first (#1084)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants