Skip to content
This repository has been archived by the owner on Sep 9, 2020. It is now read-only.

Proof-of-Concept/WIP: support Kubernetes style staging repositories #638

Closed
wants to merge 1 commit into from

Conversation

sttts
Copy link

@sttts sttts commented May 24, 2017

This is a quick'n'dirty proof of concept to implement staging/ repositories as they are used by the Kubernetes project: https://github.com/kubernetes/kubernetes/tree/master/staging

@sttts
Copy link
Author

sttts commented May 24, 2017

A first (failed) run on Kubernetes: https://gist.github.com/sttts/5625a17825ea338cff4cc6becdef16d3

@sttts
Copy link
Author

sttts commented May 24, 2017

/cc @mfojtik

@sdboyer
Copy link
Member

sdboyer commented May 24, 2017

Interesting! Thanks for putting up a PoC.

A first (failed) run on Kubernetes: https://gist.github.com/sttts/5625a17825ea338cff4cc6becdef16d3

It's not immediately apparent to me how that run is affected by the changes in this PR ( 2119 transitively valid internal packages is roughly the number I remember seeing when I last ran against k8s, a couple weeks ago), but it doesn't seem like these changes you made had anything to do with the failures that apparently occurred:

could not deduce project root for golang.org/x/time/rate: unable to deduce repository and source type for "golang.org/x/time/rate": unable to read metadata: unable to fetch raw metadata: failed HTTP request to URL "http://golang.org/x/time/rate?go-get=1": Get http://golang.org/x/time/rate?go-get=1: dial tcp 216.58.214.49:80: i/o timeout

We likely either need to turn down the parallelism on network requests (right now we just fire them all off simultaneously and hope for the best) and/or introduce smarter dynamic retries

No versions of github.com/coreos/etcd met constraints:
	v3.1.5: Unable to update checked out version: fatal: reference is not a tree: 20490caaf0dcd96bb4a95e40625559def8ef5b04

	v3.1.8: Unable to update checked out version: fatal: reference is not a tree: d267ca9c184e953554257d0acdd1dc9c47d38229

	v3.1.7: Unable to update checked out version: fatal: reference is not a tree: 43b75072bfaca5a7c35c718179defbcabd9a0886

	v3.1.6: Unable to update checked out version: fatal: reference is not a tree: e5b7ee2d03627ca33201da428b8110ef7c3e95f1

This suggests a stale repo at GOPATH/pkg/dep/sources/https---github.com-coreos/etcd that's not been updated with the latest from upstream as it should, which is something I thought we'd mostly worked out already. However, given that some of those tags are a month old...well, did you last try dep a month ago?

Unable to cache go4.org
Unable to cache github.com/samuel/go-zookeeper
Unable to cache github.com/garyburd/redigo
Unable to cache github.com/hashicorp/hcl
Unable to cache github.com/spf13/afero
...

It's concerning that we see so many of these; I wish we logged the error there so we could see the actual error.

to implement staging/ repositories

It'd be good to know more about the use case that entails the use of this staging/src pattern in Kubernetes. I've read the README in https://github.com/kubernetes/kubernetes/tree/master/staging, but what's discussed there is still fairly insider baseball - it's a functional description, but it doesn't really explain what the overall goal of the staging system is - what workflows it enables, how it relates to releases, what it means about import paths, etc. And that's going to be important, because without that, it'll be hard for me to figure out alternatives.

And alternatives are important, because...while we really, really want to make sure we're covering big projects like Kubernetes' needs, it's very unlikely we'd be able to incorporate the changes in this PR as-is.

The changes in pkgtree.go are basically equivalent in significance to defining a new vendor directory for all Go projects everywhere. We simply can't make choices like this - it would make us incompatible with the go tool. Either the toolchain would have to accept these semantics, or we'd have to drop them later.

The symlinking is at least within our realm of choice, but still something that runs against the grain - dep is already very conservative in its treatment of symlinks, as it's a major way of keeping the dragons out.

So, yeah - if you could explain a bit about the broader workflow goals of this staging area, it might help a lot with figuring out some lower-impact alternatives.

@sdboyer
Copy link
Member

sdboyer commented May 24, 2017

oh...wait. rereading that README, it seems like at least part of the goal is essentially being able to hack on multiple repositories that have mutual imports at the same time, while still retaining an encapsulated project and build reproducibility, etc.

If that's the case, then that falls under the header of the "multi-project workflow," which is definitely a known, painful problem. I have a draft writeup of how we might be able to address such things after moving into the official toolchain. Before is harder, but we can see what we can figure out.

@sttts
Copy link
Author

sttts commented May 29, 2017

@sdboyer sorry for the late feedback. was on vacation...

The goal of the staging repos is mainly to prepare the split of a monolithic repo into sub-repos, but before the infrastructure is available to effectively maintain such a multi-repo project. As a middle ground Kubernetes introduces the staging/ directory which is symlinked into vendor/.

The repos in staging/ could also live under vendor/. From a Go point of view they do. Go does not know and should not know about staging/ at all, and luckily it works fine with the symlinks. The repos are put under staging/ only to avoid being removed when dependencies are updated (today via Godep).

In other words: if golang/dep would preserve certain "authorative" repositories in vendor/ while replacing vendor/, we could get rid of staging/ and the ugly symlink hack.

One technical note: we put repos under staging/src/ instead of staging/ directly because this way we can point the GOPATH to that directory. This helps to get godep to work when vendoring the Kubernetes project itself.

@sdboyer
Copy link
Member

sdboyer commented Aug 6, 2017

eek, i didn't realize i hadn't responded to this :/

i'm gonna close this out, as it just changes the model we operate under too much. however, we (@thockin, @kris-nova, @carolynvs and i) did have a productive conversation a couple weeks ago about moving forward with k8s & dep, and we now have a project in which we're hammering some things out.

@sdboyer sdboyer closed this Aug 6, 2017
@sttts
Copy link
Author

sttts commented Aug 7, 2017

@sdboyer I don't see much in that repo. Are there any notes about what you discuessed? Would love to contribute if this is done in the open.

@sdboyer
Copy link
Member

sdboyer commented Aug 7, 2017

@sttts fantastic! there are some notes, though they're kinda sparse - https://docs.google.com/document/d/1uBnrlJIks8UWHI50u0mERxEQLYkeUrIYXeKH4Z-DZMg/edit

some important high-level points from the mtg weren't captured in those notes. for example, we understand that, k8s being as large and multiheaded as it is, a lot of what's important is to set some more fixed goals that both k8s and dep can converge towards, rather than dep just trying to chase k8s' needs of the moment. to that end, we understand that the conversation about moving k8s to dep will necessarily also be a bit interwoven with some more k8s-internal conversations about how, for example, staging/ ought to be handled, and obviating the need for symlink magic in vendor/.

i realize that might sound a bit ominous, but it shouldn't - just some expectation-setting that this is will need to be a deliberate process. also, after talking brass tacks about how the import graphs and distinct subprojects within k8s.io/kubernetes are or are not supposed to intersect, i got to a point of cautious enthusiasm that the choices y'all have already made about staging/* should align quite well with dep's assumptions already. this is just a sketch, a hypothesis at the very most, but it seems like the following could get us at least into the ballpark:

  1. move staging/ to internal/staging - nothing in there is supposed to be imported by anything external, anyway, and this just codifies that
  2. set up dep at the root of k8s.io/kubernetes, as per normal
  3. configure ignore patterns - Enable glob syntax (/...) in ignored packages #709 - for everything under internal/staging/*
  4. set up dep projects for each discrete thing under internal/staging (nothing magic here, just dep init at each dir that we want to treat as a discrete project)
  5. some strategic use of dep ensure -no-vendor (just merged on friday, part of (re)Spec and implement command set #277) will probably be valuable for building those projects

i'm almost certainly forgetting a few things, but that's as much of a braindump as i can muster right now.

@carolynvs
Copy link
Collaborator

I am still working through 1-4. For #3 we hope to avoid adding ignores by naming it _staging.

@sdboyer
Copy link
Member

sdboyer commented Aug 7, 2017

ahh right, i forgot about the underscore

@sttts
Copy link
Author

sttts commented Aug 12, 2017

@sdboyer can you comment how you plan to create the link to the internal/_staging packages in your sketch? How will dep know that internal/_staging/k8s.io/apimachinery should represent k8s.io/apimachinery?

Compare #1000 and fabulous-gopher/k8s-dep-e2e#7 for a quick prototype to allow sources pointing to non-root packages. With that you can use a constraint.source or override.source field to tell dep about the staging dirs.

@sdboyer
Copy link
Member

sdboyer commented Aug 12, 2017

@sttts the idea for internal/_staging was premised on an impression i got during our vidchat that those packages are not supposed to be imported externally. IOW, i do not have an answer to that question because this plan was predicated on not having to have an answer to that question 😄

a note, though - i'm exceedingly uncomfortable with the idea of systematically supporting something like this, as it would effectively destroy the semantics of internal dirs.

@sttts
Copy link
Author

sttts commented Aug 12, 2017

Well, they are not imported "externally". They are imported from within k8s.io/kubernetes under their external name, e.g. k8s.io/apimachinery. At the same time they are used by 3rdparties only from their nightly exports. Hence, the second use-case is completely standard. We need at least a workaround for the first.

@sdboyer
Copy link
Member

sdboyer commented Aug 12, 2017

hmm right. i wish i had taken clearer notes during our discussion - it ended up being fairly rushed 😞

are there imports from within staging back into the k8s.io/kubernetes tree?

We need at least a workaround for the first.

remind of the reason for not simply importing them directly at their path within staging, or the possible future internal/_staging?

@sttts
Copy link
Author

sttts commented Aug 12, 2017

are there imports from within staging back into the k8s.io/kubernetes tree?

no. but, there are imports between staging repos, e.g. staging/k8s.io/apiserver imports k8s.io/apimachinery. We cannot import via a full staging path there as we want to publish them to their real repos on github.

@sttts
Copy link
Author

sttts commented Aug 12, 2017

remind of the reason for not simply importing them directly at their path within staging, or the possible future internal/_staging?

This confuses go as it will have two different type universes: with and without full path.

@sttts
Copy link
Author

sttts commented Aug 14, 2017

@sdboyer @carolynvs is #1000 + fabulous-gopher/k8s-dep-e2e#7 something you could imagine to support as a solution to the staging problem? Clearly #1000 is not perfect yet, but I am very willing to work on it if the answer is yes. Also how would such an extension be discussed? Does it need a proposal or anything formal like that?

About the consistency of the approach some questions (cc @carolynvs):

  1. if two dependencies github.com/a/a and github.com/b/b both specify a source for the same transitive dependency github.com/c/c (or one does, one doesn't), which will win?
  2. are overrides only considered in the current root project and ignored in Gopkg.toml of dependencies?

As an example:

I want to vendor k8s.io/apimachinery and k8s.io/kubernetes. The later would have k8s.io/apimachinery in a staging directory and therefore it would specify a constraint with source k8s.io/kubernetes/internal/staging/k8s.io/apimachinery. Then I would expect:

  • if k8s.io/apimachinery is constrained by me to v1.7.0 and k8s.io/kubernetes as well, I would expect k8s.io/kubernetes and k8s.io/kubernetes/internal/staging/k8s.io/apimachinery in v1.7.0.
  • if k8s.io/apimachinery is constrained by me to v1.7.0 and k8s.io/kubernetes is unconstrained, I would expect to get k8s.io/kubernetes and k8s.io/kubernetes/internal/staging/k8s.io/apimachinery in v1.7.0.
  • if k8s.io/apimachinery is constrained by me to v1.7.0 and k8s.io/kubernetes as master, I would expect a conflict.
  • if k8s.io/apimachinery is unconstrained by me to v1.7.0 and k8s.io/kubernetes is constrained to v1.7.0, I would expect to get k8s.io/kubernetes and k8s.io/kubernetes/internal/staging/k8s.io/apimachinery in v1.7.0.

In other words, as soon as a staging repo is involved via source value, this is considered over the self-standing repo.

Does this match the semantics of the source field today (compare my question 1 above)?

@sdboyer
Copy link
Member

sdboyer commented Aug 14, 2017

ahh, ok. yeah, this is not a great situation 😢 trying to pretend like there are two canonical locations for one body of code is going to be an ugly hack, pretty much no matter what.

even if we did teach dep to allow us to dupe itself into believing that an external path is actually an internal one, none of the version information underneath it would make any sense. the system for swapping in alternate sources still takes version information from the alternate, not the original - meaning that it'll be k8s.io/kubernetes versions & revisions showing up in Gopkg.lock, not k8s.io/apimachinery's. that's just the first problem that occurs to me; maybe not the end of the world, but we're skating around in undefined territory.

are overrides only considered in the current root project and ignored in Gopkg.toml of dependencies?

yes.

if two dependencies github.com/a/a and github.com/b/b both specify a source for the same transitive dependency github.com/c/c (or one does, one doesn't), which will win?

this answer gets a little complicated:

  • if a/a and b/b specify no alternate source for c/c, then it works (typical case)
  • if both a/a and b/b specify the same sources for c/c, then it works
  • if both a/a and b/b specify different sources for c/c, then it's always a conflict
  • if a/a specifies a source for c/c but b/b does not, it may work, but it depends on the order in which a/a and b/b are visited by the solver.

this last case is the one you want, and is a thing in the solver i'm quite unhappy with - #428. obviously, the idea of visit order potentially determining SAT or UNSAT is repugnant, so i'm quite unhappy with this. but i figured i'd leave it until it actually came up as an issue. and, here we are 😢

@sttts
Copy link
Author

sttts commented Aug 14, 2017

this last case is the one you want,

Actually this case is not that critical. It would be fine if the user can override what he actually wants in that case.

In practice this case would happen only if a user wants to vendor k8s.io/kubernetes. We do not support this case today explicitly. People should vendor k8s.io/apimachinery, k8s.io/client-go etc. (all published staging repos from k8s.io/kubernetes).

The only case that really matter for us is the one where we have local source fields (they point to a subdirectory of the same repo they are in) in the root repo, i.e. Kubernetes would be fine with having those non-root source values only in the overrides section (if that makes it easier to accept such a solution). In constraints we can be stricter.

@sdboyer
Copy link
Member

sdboyer commented Aug 14, 2017

hmmm...that's heading back towards territory that's more acceptable - we could probably do it entirely within the solver without needing to define new concepts in SourceManager - but it would still create a pretty nonsensical Gopkg.lock file in terms of the versions and revisions that get recorded there.

@sdboyer sdboyer mentioned this pull request Aug 16, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants