Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add go support #871

Merged
merged 21 commits into from
Nov 3, 2021
Merged

Add go support #871

merged 21 commits into from
Nov 3, 2021

Conversation

nellshamrell
Copy link
Contributor

@nellshamrell nellshamrell commented Sep 7, 2021

Adding Go Support

Summary

This pull request adds support for harvesting and calculating definitions of go modules.

This has been one of the most frequently requested enhancements to ClearlyDefined (including in #765).

Limitations

This pull request only adds in support for go components with a defined go.mod file in them.

Modules were added to Go in Go 1.11 and 1.12 as a dependency management system that "makes dependency version information explicit and easier to manage".

Prior to modules, there were other 3rd party version management tools which are likely still used by some people today. Adding support for these is something we can explore in the future, but to start with we are only supporting modules.

Coordinates

A go module's coordinates are formed like this:

go/golang/namespace/name/revision

A complication we encountered early in the architecture process for go support is that module import paths have a wide variety of characters allowed. Additionally, it is very common for module import paths to have multiple slashes in the "namespace". You can see the discussion around this in #862, #864, and #865.

The solution this pull request proposes is to use url encoding for "/" in namespaces.

For example, this import path:

golang.org/fx/crypto/v0.0.0-20210921155107-089bfa567519

Becomes these coordinates:

go/golang/golang.org%2fx/crypto/v0.0.0-20210921155107-089bfa567519

This encoding must be used whenever requesting these coordinates, whether for queuing up a harvest or requesting a definition.

This will require documentation, which will be added to the ClearlyDefined website.

Related Pull Requests

When this pull request is merged and deployed, these pull requests must be merged and deployed as well.

@waynebeaton
Copy link

I'm trying to get some insight into what the CD ids should look like.

Background: I'm trying to map lines out of a go.sum file to CD coordinates (grabbing this from the go.sum file seems like an obvious solution; if there's a better one, I'd love hear about).

AFAICT from the code, discussion, and Google Doc, the id is structured as:

type: go
source: golang
namespace: when there are two segments in the URI, the namespace is the first segment; when there are three segments, the namespace is the first two.
name: the last segment of the URI
version: the version from the line, including the "v".

I've come up with the following example mappings:

github.com/spf13/cobra v0.0.5 h1:f0B+LkLX6DtmRH1isoNA9VTtNUK9K8xYd28JNNfOv/s=
becomes: go/golang/github.com%2Fspf13/cobra/v0.0.5

golang.org/x/tools v0.0.0-20180221164845-07fd8470d635/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
becomes: go/golang/golang.org%2Fx/tools/v0.0.0-20180221164845-07fd8470d635

These take a slightly different form:

google.golang.org/genproto v0.0.0-20190418145605-e7d98fc518a7/go.mod h1:VzzqZJRnGkLBvHegQrXjBqPurQTc5/KpmUdxsrq26oE=
becomes: go/golang/google.golang.org/genproto/v0.0.0-20190418145605-e7d98fc518a7

gopkg.in/alecthomas/kingpin.v2 v2.2.6/go.mod h1:FMv+mEhP44yOT+4EoQTLFTRgOQ1FBLkstjWtayDeSgw=
becomes: go/golang/gopkg.in%2Falecthomas/kingpin.v2/v2.2.6

I'm not sure what to do with this (there's no path or module; what do we do when the URI is only one segment?):

go.opencensus.io v0.21.0/go.mod h1:mSImk1erAIZhrmZN+AvHh14ztQfjbGwt4TtuofqLduU=

In case it's at all interesting, I'm teasing the lines in the go.sum file apart using this regular expression:

^(?<source>[^/\s]+)\/(?<path>[^/\s]+)(?:\/(?<module>[^\s]+))?\s(?<version>[^\s/]+).*$

The "v" being part of the version feels weird. I observed this in the example. It's not clear to me, in the case where the version starts with "v0.0.0" whether or not the version should just be the qualifier/hash that follows.

Am I anywhere close?

@nellshamrell
Copy link
Contributor Author

Hi @waynebeaton!

Thanks for the comments :)

You are indeed very close. I chose to include the "v" because I noticed that it is used in go.mod files like this one and it seemed cleaner to include it. I'm not completely attached to it, though. I believe we have used the "v" in other types of components when specifying the revision, so it feels more consistent to include it.

My current plan, in the case of no path, is to use a "-" for the namespace.

So

go.opencensus.io v0.21.0

would correspond to

go/golang/-/go.opencensus.io/v0.21.0

Something I am having some trouble with is that I use proxy.golang.org to download the modules's source. However, modules that are in the go standard library are available from proxy.golang.org. go get pulls standard library modules from the install of go on someone's system, which makes sense. I'm unsure how to represent the standard library modules in ClearlyDefined or if we should. Do you have thoughts on this?

Thank you so much!

@waynebeaton
Copy link

I probably should have started by saying that I'm completely new to Go and am just fumbling around trying to figure out how to grab and license-check dependencies...

My preference is to include the standard library modules. Having said that, I can't say that I've ever really thought too hard about the Java runtime... I guess that I'm not sure.

@waynebeaton
Copy link

You are indeed very close. I chose to include the "v" because I noticed that it is used in go.mod files like this one and it seemed cleaner to include it. I'm not completely attached to it, though. I believe we have used the "v" in other types of components when specifying the revision, so it feels more consistent to include it.

If it's always there, then it doesn't really add any value. I can't say that I have looked at even a fraction of the ClearlyDefined data, but I don't recall ever having observed another ID with a "v" prefixing the revision. My strong preference is consistency in the format (i.e., to not include it unless it has actual specific meaning).

I also recall (this might have been in the Google Doc) discussion regarding how the revision is represented when it starts with "0.0.0". When we encounter, for example, "v0.0.0-20190418145605-e7d98fc518a7", the actual revision would be "20190418145605-e7d98fc518a7". Has a decision been made about that?

@nellshamrell
Copy link
Contributor Author

Hi @waynebeaton -

I'm agree with you on the "v" not really having a meaning in this case and I'm willing to remove it. I may go back on this if it makes it harder to script automated checks against ClearlyDefined (for example, an application that parses through go dependencies and queries ClearlyDefined for each dependency). Sounds like something to experiment with!

@nellshamrell
Copy link
Contributor Author

Hi @waynebeaton - apologies for the delay in responding. You are correct in that it's a bit weird that it points to the information page, rather than the source. I'm also running into the problem that there does not appear to be any consistent way (not that I've found, at least) for determining a pointer to the actual source archive.

@nellshamrell
Copy link
Contributor Author

nellshamrell commented Oct 18, 2021

Testing

Just did some testing (using both this pull request and the equivalent crawler pull request in a local environment)

Modules that harvested fine

(Harvested successfully, all tools ran find, found declared and discovered licenses)

  • localhost:3000/definitions/go/golang/code.cloudfoundry.org/clock/v1.0.0
  • localhost:3000/definitions/go/golang/go.uber.org/atomic/v1.9.0
  • localhost:3000/definitions/go/golang/gopkg.in/check.v1/v1.0.0-20200227125254-8fa46927fb4f
  • localhost:3000/definitions/go/golang/gopkg.in/yaml.v2/v2.4.0
  • localhost:3000/definitions/go/golang/google.golang.org/protobuf/v1.27.1
  • localhost:3000/definitions/go/golang/cloud.google.com/go/v0.87.0
  • localhost:3000/definitions/go/golang/k8s.io/api/v0.20.4

Modules where the discovered licenses were found, but not declared licenses

  • localhost:3000/definitions/go/golang/golang.org%2fx/net/v0.0.0-20210405180319-a5a99cb37ef4
  • localhost:3000/definitions/go/golang/software.sslmate.com%2fsrc/go-pkcs12/v0.0.0-20210415151418-c5206de65a78
  • localhost:3000/definitions/go/golang/github.com%2fsatori/go.uuid/v1.2.1-0.20181028125025-b2ce2384e17b

Modules where the harvests returned errors

localhost:3000/definitions/go/golang/github.com%2fAzure%2fgo-autorest/autorest/v0.11.20
github.com/Azure/go-autorest/autorest v0.11.20

localhost:3000/definitions/go/golang/github.com%2fAzure/azure-sdk-for-go/v55.8.0+incompatible
github.com/Azure/azure-sdk-for-go v55.8.0+incompatible

localhost:3000/definitions/go/golang/golang.org%2fx/crypto/v0.0.0-20210711020723-a769d52b0f9
golang.org/x/crypto v0.0.0-20210711020723-a769d52b0f9

  • StatusCodeError: 410 - "not found: golang.org/x/[email protected]: invalid pseudo-version: revision is shorter than canonical (a769d52b0f97)"
  • does have an entry on pkg.go.dev https://pkg.go.dev/golang.org/x/crypto
  • does have an entry in https://index.golang.org (there are 34 versions)
  • UPDATE - this was due to a typo on my part, I accidentally left a character off at the end of the revision. This call did NOT error out localhost:3000/definitions/go/golang/golang.org%2fx/crypto/v0.0.0-20210711020723-a769d52b0f97

@waynebeaton
Copy link

Hi @waynebeaton - apologies for the delay in responding. You are correct in that it's a bit weird that it points to the information page, rather than the source. I'm also running into the problem that there does not appear to be any consistent way (not that I've found, at least) for determining a pointer to the actual source archive.

AFAICT, we can make some good guesses based on GitHub URLs and version matching against tags, but it'd sure be handy to have that information in metadata. I'll keep digging to see what I can figure out.

@nellshamrell
Copy link
Contributor Author

nellshamrell commented Oct 19, 2021

More notes on the test components where we are not finding the declared license:

localhost:3000/definitions/go/golang/golang.org%2fx/net/v0.0.0-20210405180319-a5a99cb37ef4

  • discovered license BSD-2-Clause, BSD-3-Clause, NOASSERTION
  • does not have a declared license
  • looking at the files that are harvested, it does find a license file, but it's a few levels down golang.org -> x -> [email protected] -> LICENSE

localhost:3000/definitions/go/golang/software.sslmate.com%2fsrc/go-pkcs12/v0.0.0-20210415151418-c5206de65a78

  • discovered licenses BSD-3-Clause, BSD-3-Clause AND MIT
  • does not have a declared license
  • license file is also a few levels down software.sslmate.com -> src -> [email protected] -> LICENSE

localhost:3000/definitions/go/golang/github.com%2fsatori/go.uuid/v1.2.1-0.20181028125025-b2ce2384e17b

  • discovered license MIT
  • does not have a declared license
  • license file is also a few levels down github.com -> satori -> [email protected] -> LICENSE

Other Modules that do not show a declared license (but do show discovered license(s))

GitHub.com modules

Maybe something to do with GitHub being the source?

golang.org modules

Notes on modules that do show a declared license

http://localhost:3000/definitions/go/golang/code.cloudfoundry.org/clock/v1.0.0

  • Declared: Apache-2.0
  • Discovered: Apache-2.0
  • License file path: `code.cloudfoundry.org -> [email protected] -> LICENSE

http://localhost:3000/definitions/go/golang/go.uber.org/fx/v1.14.2

  • Declared: MIT
  • Discovered: MIT
  • License file path go.uber.org -> [email protected] -> LICENSE

@nellshamrell
Copy link
Contributor Author

nellshamrell commented Oct 19, 2021

Next steps for this week:

  • Focus on errors - add messaging for when a module does not have a valid go.mod file (DONE)
  • Focus on what in github.com path modules makes ClearlyDefined not recognize the declared license (this is the one I can reproduce consistently, and fixing this may fix the other as well)

@nellshamrell
Copy link
Contributor Author

I think I know what's going on with the go modules that, when we harvest them, do not show a Declared license.

The key is this function in lib/utils.js

function getLicenseLocations(coordinates) {
  const map = { npm: ['package/'], maven: ['meta-inf/'], pypi: [`${coordinates.name}-${coordinates.revision}/`], go: [`${coordinates.namespace}/${coordinates.name}@${coordinates.revision}/`] }
  return map[coordinates.type]
}

We are looking for licenses in

`${coordinates.namespace}/${coordinates.name}@${coordinates.revision}/`

For modules like http://localhost:3000/definitions/go/golang/code.cloudfoundry.org/clock/v1.0.0, where the declared license is found, the structure of the unpacked module is like this:

code.cloudfoundry.org -
|
 -> [email protected] 
      |
       -> LICENSE

The license file is in the code.cloudfoundry.org/[email protected] directory, it matches ${coordinates.namespace}/${coordinates.name}@${coordinates.revision}/.

However, with a module like http://localhost:3000/definitions/go/golang/github.com%2fgoogle%2fgo-github/v32/v32.1.0, the unpacked module is structured like this:

github.com
|
 -> google
      |
       -> go-github
            |
            -> [email protected]
                 |
                  ->  LICENSE

The license file path is github.com/google/go-github/[email protected]/LICENSE, which does not match ${coordinates.namespace}/${coordinates.name}@${coordinates.revision}/, there are some extra directories there.

I will update the license file path to include directories in between the namespace directory and the name@version directory.

@nellshamrell
Copy link
Contributor Author

nellshamrell commented Oct 27, 2021

The latest commits have fixed the issues with finding declared licenses for the vast majority of go modules!

@nellshamrell
Copy link
Contributor Author

nellshamrell commented Oct 27, 2021

Just did a rebase and this seems to be in good shape. There are a couple of things still to do before this is ready for review:

  • Clean up the Associated Crawler PR and get it ready for review
  • Add documentation around requesting definitions/harvests for go components - coordinates for these components (especially those with slashes in the namespace) can be complex
  • Resolve discussion of whether we should include v in the revision for coordinates - i.e. should it be v1.2.3 or 1.2.3?
  • Investigate any changes that might need to be made to the UI (and whether those should be delayed until the UI/UX redesign is deployed by EOY)

@nellshamrell
Copy link
Contributor Author

nellshamrell commented Oct 28, 2021

With regard to the question of whether we should include the "v" in revisions for go modules, I've been giving this some thought.

I asked someone to send several sample go.mod files to me and noticed that, for each revision defined in them, the v is included. I also took a look at some go.sum files and noticed they follow the same convention - including the "v" for a defined version of a module. Additionally, I took a look at how versions are listed on pkg.go.dev (example) and they also include the v (additionally, you include the v when you request a version of a revision through the proxy.golang.org.

The convention in the go community seems to be to include the "v" and I believe ClearlyDefined should follow the community convention and include the "v" in revisions.

@nellshamrell
Copy link
Contributor Author

Hi @fossygirl! I have a question on this issue from 2018 #228

It appears that, when a go module is in a repository with multiple go modules.

And we download a go module's source through proxy.golang.org (which is implemented in the related crawler PR and analyzed here in the Service), we only get the source code for the individual go module, not the entire repository.

Currently, we search the source code for the individual go module for license files and determine the declared/discovered licenses from there. Do you see a conflict with the suggestions/requirements in #228?

@fossygirl
Copy link
Member

@nellshamrell I don't know about conflicts in Go. @jeffmcaffer @jeffmendoza might be good people to talk to here.

@nellshamrell
Copy link
Contributor Author

@fossygirl and I talked offline - I think we are ok (as far as I can tell) using the license in the module's directory, even if it is part of a larger repo.

@nellshamrell
Copy link
Contributor Author

nellshamrell commented Oct 28, 2021

Regarding the UI - the only change that would be needed in the website would be in the page where harvests can be queued. This page is currently undergoing a major overhaul and I plan on waiting from adding go to this page until that design is complete and deployed.

Captured in this issue on the website repo

Signed-off-by: Nell Shamrell <[email protected]>
@nellshamrell nellshamrell marked this pull request as ready for review October 28, 2021 23:32
providers/summary/clearlydefined.js Outdated Show resolved Hide resolved
routes/originGo.js Outdated Show resolved Hide resolved
lib/utils.js Outdated Show resolved Hide resolved
routes/originGo.js Outdated Show resolved Hide resolved
@jeffmendoza
Copy link
Member

Looks like docs/architecture/go_components.md will be out of date with the latest decisions. follow-up?

@nellshamrell
Copy link
Contributor Author

Good point, @jeffmendoza! Done!

@jeffmendoza
Copy link
Member

Cool, looks great. Happy to see this coming together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants