Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nested vendor folders management. #303

Open
sgotti opened this issue Mar 5, 2016 · 10 comments
Open

nested vendor folders management. #303

sgotti opened this issue Mar 5, 2016 · 10 comments

Comments

@sgotti
Copy link

sgotti commented Mar 5, 2016

Hi,

I'm quite sure somehow this was already discussed but I'd like to put down my view.

Glide is flattening the dependency tree in the base vendor/ dir. From my understanding this mean that all the dependencies (also the inherited ones) are installed in the base vendor/ dir.

If a dependant package also vendors its dependencies, glide doesn't remove them and these ones take the precedence and are used for compilation instead of the one in the base project vendor/.

I'm temped to call this "soft flattening" .

It's clear that many projects (many monorepo so they also provides libraries) wants to vendor their dependencies (where their vendoring meaning is: commit the vendor directories in VCS), to avoid disappearing repos, going to the network to download etc...
This also looks like the golang stated solution.

Given this I can see two different possibilities to manage nested vendor folders that will have different impacts and behaviors:

  1. Keep the nested vendor folders (soft flattening) like done now by Glide:
    This is needed if my project needs a package at version X incompatible with the one vendored (committed in VCS) by the package.
    On the other side this may trigger problems if the package is designed to accept or return types of a vendored package like clearly explained by you here: https://github.com/mattfarina/golang-broken-vendor.
    Additional it'll cause code duplication etc...
  2. Remove the nested vendor folders (hard flattening):
    This will create problems if two packages have a common dependency on a package with incompatible versions. Perhaps this problem will be resolved if semver is used
    (so Glide will be able to detects conflicts and act in some ways, like keeping the nested vendor folder).
    On the other side this approach will avoid problems like https://github.com/mattfarina/golang-broken-vendor and code duplication.

What are your thoughts on this?

I can't see a default solution, perhaps the user should be able to force one or the other behavior per package dependency? (but this will mean that the user must explicitly define a repo and also subpackages, also if not an explicit dep but an inherited one, in glade.yml and define something like keepvendor: true/false)

Additionally, will removing just the nested vendor folder break some licenses like explained in http://engineeredweb.com/blog/2016/go-why-not-strip-unused-pkgs/ ?

@mattfarina
Copy link
Member

This is a good and difficult question.

I don't think there's a license issue with removing vendored packages from a dependency. But, that's the least of the problems.

Here are a couple examples.

  1. What if you don't strip VCS information. Stripping the vendor/ directories from dependencies will cause those repos to become dirty. Working with them becomes significantly harder. Especially for those who work on a dependency at the same time as the parent and push the dependency repo. This can quickly get messy to manage.
  2. If a nested dependency has a reason to store a specific version.

This makes it hard for a tool to automate things.

As a rule, libraries should not vendor/ dependencies unless they have a very good reason.

I'm not sure how to do this better than we do today for the go toolchain. There is a possible opportunity to detect if a dependency is vendored by a project and skip fetching it.

@sgotti
Copy link
Author

sgotti commented Mar 8, 2016

@mattfarina Thanks for your answer.

I just noticed other tools do (or want to do) "hard" flattening. For example I found some issues on godep like tools/godep#428 and tools/godep#224. In this last one @thockin talks about something similar to this issue (soft vs hard flattening) and, in the end, he prefers "hard" flattening (I apologize if I misunderstood something).

Probably there isn't an automatic solution given pros and cons of soft vs hard flattening and actually it's very difficult if not impossible to detect when a nested vendor is needed or not (since I can put anything inside a vendor folder without any vendoring tool information to detect the package version, sometime also without a source repo etc...).

Probably the unique solution is to let the user choose what's better.

What I'd like to propose is to add some options to Glide to give the user the ability to choose the preferred behavior but I'm not sure how to specify it (options? config file?. What I had in mind was:

  • "soft" flattening (like now)
  • "hard" flattening (remove al nested vendor/ folder at any level)
    • Define some nested vendor folders to keep (for example because that dependency uses an incompatible pkg). But the difficult part is on how can we specify what to preserve in a config options? I can think only about specifying the full path (I don't see other solutions since a vendor directory can be at any level inside a dependency not just at the root level of the dependency, plus there can be multiple level of nested vendor folders...)

If this actually doesn't fit directly inside Glide, I'm temped to initially add this to https://github.com/sgotti/glide-vc. Something like --no-nested-vendor and --nested-vendor-keep="list of paths" in addition to the current/future option (considering I'm going to change them to opt in like proposed by you). I'll also add a note specifying that doing this will change the build and also runtime (since another package is used to compile a dependency) behaviors.

What do you think about this approach?

Returning to your questions:

What if you don't strip VCS information. Stripping the vendor/ directories from dependencies will cause those repos to become dirty. Working with them becomes significantly harder. Especially for those who work on a dependency at the same time as the parent and push the dependency repo. This can quickly get messy to manage.

Good point. Will be possible to improve the--update-vendored option to also detect if a repo is dirty and act the same (replacing the repo)?

If a nested dependency has a reason to store a specific version.

See above (explicit keeping of some nested vendor directories).

As a rule, libraries should not vendor/ dependencies unless they have a very good reason.

I can understand it, but on the other side it's difficult to define why a library should not vendor a specific version of a package if it needs to.
It's more difficult to distinguish what can be defined to be a library since many packages I use are part of a larger repo and usually there's a vendor folder at the root of the project so also if I'm using just this package (without any vendor folder in it), with glide, since I'm getting the whole repo, I get also all its parent vendor folders.

@mattfarina
Copy link
Member

@sgotti at the moment I would prefer something like glide-vc handle stripping nested vendor directories. I may be persuaded to change my mind but I don't really want to add a bunch of complexity into Glide for that right now.

Dirty would be hard to detect for a vendored repo. If the VCS data isn't available the VCS can't easily do it for us. This is complicated. What's dirty? Missing files? Changed files? With a VCS it's pretty easy. Without it we just ignore it which is, in part, why updating vendored is opt-in.

I think stripping vendor directories should be in the hands of the developer. They know what they're building and can deal with the consequences. This is really why libraries should not vendor. This is, in part, why storing outside packages in your VCS isn't common (or even allowed) in some other toolchains.

Yet, I understand the issues k8s and others are having that leads to this.

@thockin
Copy link
Contributor

thockin commented Mar 16, 2016

I was running into this today, too. I want glide to see that a dep has a
Godeps file (or glide or ...) and import the deps into the root BUT NOT
save them in vendor/.

As an example: Kubernetes vendors heapster. heapster vendors a client lib
from kubernetes, which has a wide fan-out and transitively includes most of
kubernetes. I am now making a copy of most of kubernetes/... in
kubernetes/vendor/.../heapster/Godeps/_workspace/src/.../kubernetes.
Thankfully glide seems to be smart enough to not recursively vendor that
(or I would have kubernetes/vendor/.../kubernetes/... :)

Stripping out deps' vendor/ and Godeps/ and so on will save a lot of
ugliness and confusion. I don't actually know if Go will respect
repo/vendor/foo.com/bar/vendor/... or not - I hope not, but I suspect so.

On Fri, Mar 11, 2016 at 12:06 PM, Matt Farina [email protected]
wrote:

@sgotti https://github.com/sgotti at the moment I would prefer
something like glide-vc handle stripping nested vendor directories. I may
be persuaded to change my mind but I don't really want to add a bunch of
complexity into Glide for that right now.

Dirty would be hard to detect for a vendored repo. If the VCS data isn't
available the VCS can't easily do it for us. This is complicated. What's
dirty? Missing files? Changed files? With a VCS it's pretty easy. Without
it we just ignore it which is, in part, why updating vendored is opt-in.

I think stripping vendor directories should be in the hands of the
developer. They know what they're building and can deal with the
consequences. This is really why libraries should not vendor. This is, in
part, why storing outside packages in your VCS isn't common (or even
allowed) in some other toolchains.

Yet, I understand the issues k8s and others are having that leads to this.


Reply to this email directly or view it on GitHub
#303 (comment).

@mattfarina
Copy link
Member

@thockin This is where things get hard.

The way go build and the other go tools work is to look in the vendor/ directory a requesting package is in to find a dependency. If it doesn't find it there it walks up the directory tree searching in each vendor/ directory for it. If not found in a vendor/ directory it goes back to the traditional methods of the GOAPTH and then GOROOT.

This is why the discussions in places like the go nuts mailing list have been to not vendor outside dependencies in your project if you're building a library for others to use.

So yeah, repo/vendor/foo.com/bar/vendor/... is respected. When I asked about it I was told that different people may want to vendor different versions or forks in their codebase. So, it's intentional and people are doing this.

Bringing this back to Glide, we've been wary to add something that alters a codebase to Glide which is what you're asking will do. The project glide-vc is working to be able to do this with numerous options. With this installed you'll be able to do things like run:

$ glide vc

We're also going to add a scripts section to Glide. Consider this in the glide.yaml:

scripts:
  setup: go get -u github.com/sgotti/glide-vc
  install:
  - glide install --update-vendored --strip-vcs
  - glide vc

Then you can run:

$ glide run setup
$ glide run install

We need the scripts for varying reasons such as installing applications used by generators. None of the scripts would be run automatically (for security reasons). It would be great to just use make but that's not always available or cross platform.

Thoughts? This is something we may want to get on a hangout to discuss in more detail.

@thockin
Copy link
Contributor

thockin commented Mar 16, 2016

The way go build and the other go tools work

Yeah, that all makes sense EXCEPT when you're vendoring something that vendors.

This is why the discussions in places like the go nuts mailing list have been
to not vendor outside dependencies in your project if you're building a
library for others to use.

That's great in theory. A large number of libs that we use are part of a repo
that also includes a binary. Etcd. Rkt. Heapster. Cadvisor. Docker.
Asking every repo to break itself into a "libs only" repo and a "binary only"
repo is a non-starter.

Given that, what happens once everyone uses glide (world domination, yay)?

  1. heapster (the binary) uses and vendors kubernetes/pkg/client
  2. glide imports all of kubernetes (because you don't want to prune files, more on that later)
  3. heapster now has a vendor/k8s.io/kubernetes/vendor/... dir.
  4. kubernetes vendors the heapster client lib
  5. kubernetes now has a vendor/k8s.io/heapster/vendor/k8s.io/kubernetes/vendor/... directory

At this point you've gone full-recursive. There's not ACTUALLY a cyclical
dependency between Go packages (which are the true unit of import), but glide
has made a problem by claiming that a full repo is the unit of dependency.

Something has to be the turtle at the bottom. Godep respects go packages (plus
licenses and other metadata) so this is not an issue.

I think any vendoring tool HAS TO break the false cycle here, and the way to do
that is to first interrogate the user (which version do you want to keep) and
then discard the recursive deps. Perhaps one answer to "which version do you
want" is "both" in which case it leaves the recursive deps alone - does that
put enough freedom in the users' hands, while still doing something sane by
default?

Bringing this back to Glide, we've been wary to add something that alters a codebase

Well, our lawyers were pretty clear that this is not an alteration in the
sense of triggering redistribution clauses of licenses. It is established
practice within the Go ecosystem. But you shouldn't trust MY lawyers - you
should consider it yourself. That said, I think stripping out
vendor/... and Godeps/_workspace/... and similar things is EVEN LESS
worrisome than pruning down to the minimum set required. I think pruning
recursive deps has to happen.

Regarding glide-vc: if that's the best we can do, we'll do it, but I'll go on
record as it being more clumsy than it needs to be. I can respect (though
disagree) with the position that pruning code out of a repo is an "alteration",
but I can't buy that pruning recursive deps is an alteration. I think the
default should be to process-and-prune recursive deps. If you want to leave
minimization to glide-vc, so as to put the users fingerprints on it, I am fine
with that (but probably won't do it).

Scripts is a cute hack, but it means that everyone's repo will be different
with different target names. It's Makefile all over again.

@mattfarina
Copy link
Member

@thockin thanks for being patient with me while I work through my thinking on this and taking so much time for your responses.

If you get a few moments can you take a look at #339. It's a pull request I just wrote on the topic.

@sgotti
Copy link
Author

sgotti commented Mar 31, 2016

@mattfarina Thanks for listening and working in this. I think that the work done for glide 0.10 (strip-vcs and strip-vendor) is a good base solution for this issue.

As also explained in #349 handling some incompatible packages with same name could be done installing them as separate nested vendors dirs (some like in npm v3) but as explained this is a tricky problem (causing possible incompatibilities).

And now glide-vc can be used just to remove all unneeded packages and files (better than doing vendor management inside it) if license permits this and to preserve space when committing the vendor directory.

If you want this issue should be closed and perhaps future updates will go in new issues.

@sdboyer
Copy link
Member

sdboyer commented Apr 1, 2016

fwiw, in the new SAT solver-based engine, the intention is to flatten as an (aggressive) default - because it's ultimately the only path to a sane system - but to allow exceptions in cases where it's provably safe automatically...and then probably in other, perhaps user-chosen situations as well.

also, this pretty much describes my view:

I can respect (though disagree) with the position that pruning code out of a repo is an "alteration", but I can't buy that pruning recursive deps is an alteration.

pmyjavec added a commit to influxdata/semaphore that referenced this issue Feb 23, 2017
Removal of import  statement (redundant) would break immediately if an upstream package is using
vendoring and stripping sub-vendor directories or "flattening".  See
Masterminds/glide#303 for more.

* Remove redundant import statement
@pmyjavec
Copy link

I'm new to glide and trying to understand how nested vendor directories are handled presently.

From what I can understand, "soft-flattening" is now happening by default, but "hard-flattening" is optional. Do this sound accurate?

For example, all dependencies end up at the top level vendor, but all nested vendor directories can be removed on request by specifying --strip-vendor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants