Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for go workspaces #457

Merged
merged 3 commits into from
Jun 21, 2024

Conversation

ejegrova
Copy link

@ejegrova ejegrova commented Jan 24, 2024

discussion: #398

checksum validation:

  • go.work.sum + all go.sum files in modules are used to get checksums
  • if checksum is not found, the property cachi2:missing_hash:in_file has value of go.work.sum

JIRA: STONEBLD-2043

Maintainers will complete the following section

  • Commit messages are descriptive enough
  • Code coverage from testing does not decrease and new code is covered
  • Docs updated (if applicable)
  • Docs links in the code are still valid (if docs were updated)

Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:

Copy link
Member

@eskultety eskultety left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is still a draft I only had some immediate comments on the proposal but didn't do a detailed in-depth review of the logic fitting the overall design.

cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
@ejegrova ejegrova force-pushed the workspaces branch 4 times, most recently from 63706be to 3be712f Compare March 14, 2024 12:05
@ejegrova ejegrova marked this pull request as ready for review March 14, 2024 12:41
@ejegrova ejegrova marked this pull request as draft April 10, 2024 09:49
@ejegrova ejegrova force-pushed the workspaces branch 2 times, most recently from c67073f to a1cbcdb Compare April 15, 2024 13:00
@ejegrova ejegrova force-pushed the workspaces branch 2 times, most recently from 3c186c7 to a9b5e69 Compare April 17, 2024 08:19
@ejegrova ejegrova marked this pull request as ready for review April 17, 2024 12:29
Copy link
Member

@eskultety eskultety left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, I learnt something new during the review, e.g. that any repo-local submodule still needs to be declared uniquely within all Go modules and so it is imperative one still uses the replace keyword to denote that the local implementation should be used even though that is the only one!
Anyway, I think we need some documentation update on workspaces, don't we?

tests/unit/package_managers/test_gomod.py Outdated Show resolved Hide resolved
tests/unit/package_managers/test_gomod.py Show resolved Hide resolved
cachi2/core/package_managers/gomod.py Show resolved Hide resolved
cachi2/core/package_managers/gomod.py Show resolved Hide resolved
cachi2/core/package_managers/gomod.py Show resolved Hide resolved
cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
Comment on lines 863 to 868
main_module_name = go([*go_list, "-m"], run_params).rstrip()
modules_json_stream = go([*go_list, "-m", "-json"], run_params).rstrip()
main_module_dict, workspace_dict_list = _process_modules_json_stream(
app_dir, modules_json_stream
)

path = main_module_dict["Path"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having recently worked on the toolchain selection which gave me headaches wrt/ unit tests because _resolve_gomod is a beast of a function my impression has become that anything that needs to execute a Go command should go into a separate function to potentially make mocking in test_resolve_gomod much easier.

Also from logical perspective this particular block seems to be open-coded quite a bit. What if we introduced a function, say _parse_modules along the following lines:

def _parse_modules(app_dir, version_resolver) -> list[ParsedModule]:
  run_go_list_json
  process_modules_json_stream

  ...
  return [main_module] + workspace_modules

and then in resolve_gomod we would just pop the main module out of the list for some further processing, but the idea is to make the code in resolve_gomod cleaner, leaner and easier to follow. Do you think the ^above would help achieving that by consolidating the special casing of the main module we're doing here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree this is a better approach, but since this is refactoring work, can we do it as a follow up?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be afraid it would remain TODO forever :), but okay, UNLESS it turns out more substantial changes are needed within this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I went back and implemented this, and the code looks better now.

I'm now wondering if the workspaces scenario that we added to test_resolve_gomod is worth it at all. It brings more complexity to the test and a buch of new files committed to the repo, and I think the only coverage it brings us is this line here. All the rest is tested individually, AFAICT.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now wondering if the workspaces scenario that we added to test_resolve_gomod is worth it at all. It brings more complexity to the test and a buch of new files committed to the repo, and I think the only coverage it brings us is this line here

You're sure about the coverage? test_resolve_gomod has nothing to do with _create_modules_from_parsed_data - that one is only ever called from fetch_gomod_source which is out of context for test_resolve_gomod.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm stupid, I tagged the wrong line. This is the right one.

cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
tests/integration/test_gomod.py Show resolved Hide resolved
@ejegrova ejegrova force-pushed the workspaces branch 5 times, most recently from cc7eb2d to 362afd4 Compare May 28, 2024 16:18
@eskultety
Copy link
Member

My question is this: what is the main problem of letting users point cachi2 to the go.work file directly the same way as we expect them to do with go.mod for the main module to be processed? That would lead to a consistent behaviour from UX perspective where we'd have to process all dependent modules. What are the drawbacks of that? @brunoapimentel do you see a use case where ^this wouldn't work and simply fail with cachi2 logic?

cachi2/core/package_managers/gomod.py Outdated Show resolved Hide resolved
tests/unit/package_managers/test_gomod.py Outdated Show resolved Hide resolved
tests/unit/package_managers/test_gomod.py Outdated Show resolved Hide resolved
@brunoapimentel
Copy link
Contributor

My question is this: what is the main problem of letting users point cachi2 to the go.work file directly the same way as we expect them to do with go.mod for the main module to be processed? That would lead to a consistent behaviour from UX perspective where we'd have to process all dependent modules. What are the drawbacks of that? @brunoapimentel do you see a use case where ^this wouldn't work and simply fail with cachi2 logic?

Erik and I had a private chat about this, and we figured out that go commands are not affected by which directory they're called in, as long as they're under the directory listed by go env GOWORK.

@eskultety I played with the idea a little bit. The only problem we have is if the directory that contains the go.work file is not a go module (i.e. does not have a go.mod file). Initially, I was trying to work around the need to have a "main module" (the one passed as the input package path in the CLI), but the whole gomod code is based around that idea, so this led to a lot of changes. I think this could work, and maybe the code could even be improved by that, but to do it right, it would be a major change to the gomod code.

So I tried our second idea, that was electing the first workspace listed in go.work as the main module, which is a much simpler approach. You can see the result in the new commit I added, and with it, I could fully process a repository just by pointing to its root folder. I still need to double check if the fact of picking a main module can have any impact on local replacements/workspaces, since Cachi2 uses the main module's version and path to fill data on those. But it is doable in case we want to allow the user to just point to go.work.

@brunoapimentel
Copy link
Contributor

brunoapimentel commented Jun 17, 2024

My question is this: what is the main problem of letting users point cachi2 to the go.work file directly the same way as we expect them to do with go.mod for the main module to be processed?

The conclusion we reached about this suggestion is that, although we want to provide users more flexibility by allowing them to point to a directory that contains a go.work file but is not a go module, such change would require a major refactoring in the code that we don't think is worth pursuing as part of this PR.

The main problem that we reached is how to work around the need of having what Cachi2 defines as a "main module", since for each of the input packages fed to the CLI, a call to _resolve_gomod is made where the package is defined as the "main module" for purposes of the resolve algorithm. If the input package points to a folder that is not a valid go module, than we can't use it as the "main module".

Here's a small (and likely incomplete) list of impacted functions that would need to be changed:

  • relevant files (such as go.mod and go.sum) are checked in the main module directory for symlinks that point to outside of the cloned repository
  • all go commands are executed in the main module directory
  • vendored deps are checked in the vendor folder in the main module directory
  • the main module name and its path are used to identify its version (as a version tag can be applied to a repo subpath, and also the module can be nested under a vN folder, where N corresponds to the major version of that module)
  • the main module path is used to identify the relative path of each workspace and local replacements, which will end up composing the purls in the SBOM
  • the main module path is used to verify if a local replacement resolves to a folder inside the repo

I personally believe that all of those can be worked around, and the code will likely end up cleaner after the refactoring. So we can tackle this as a follow up task. @eskultety Please add any details I might've missed.

@eskultety
Copy link
Member

My question is this: what is the main problem of letting users point cachi2 to the go.work file directly the same way as we expect them to do with go.mod for the main module to be processed?

The conclusion we reached about this suggestion is that, although we want to provide users more flexibility by allowing them to point to a directory that contains a go.work file but is not a go module, such change would require a major refactoring in the code that we don't think is worth pursuing as part of this PR.

I'd just add that solutions should be more often than not proposed as whole in their entirety. That said, I do have to agree that in this particular case we're talking about significant conceptual changes to how we perceive a Go project structure to represent it and process it which in comparison to the changes being proposed in this PR would unarguably only provide a marginal improvement to the consumer facing UI. If this weren't the case or the changes being proposed would also end up making substantial modifications to the code base then it would be up for a lengthy discussion and likely lead to a request for a complete solution.

The main problem that we reached is how to work around the need of having what Cachi2 defines as a "main module", since for each of the input packages fed to the CLI, a call to _resolve_gomod is made where the package is defined as the "main module" for purposes of the resolve algorithm. If the input package points to a folder that is not a valid go module, than we can't use it as the "main module".

Yeah, the concept for the need of a "main" module seems off in context of workspaces, so we definitely need a conceptual change here.

Here's a small (and likely incomplete) list of impacted functions that would need to be changed:

* relevant files (such as go.mod and go.sum) are checked in the main module directory for symlinks that point to outside of the cloned repository

* all go commands are executed in the main module directory

With most Go commands being workspace context aware ^this particular change should be insignificant compared to others

* vendored deps are checked in the `vendor` folder in the main module directory

IIUC ^this one should be taken care of in #553. It should also be straight forward since go work vendor is factually an equivalent of go mod vendor

* the main module name and its path are used to identify its version (as a version tag can be applied to a repo subpath, and also the module can be nested under a `vN` folder, where N corresponds to the major version of that module)

I think ^this one may the biggest problem to figure out in context of the suggested future refactor to introduce the option of pointing cachi2 to a directory containing a go.work file rather than a "main" module directory containing the go.mod file.

* the main module path is used to identify the relative path of each workspace and local replacements, which will end up composing the purls in the SBOM

* the main module path is used to verify if a local replacement resolves to a folder inside the repo

I personally believe that all of those can be worked around, and the code will likely end up cleaner after the refactoring. So we can tackle this as a follow up task. @eskultety Please add any details I might've missed.

In conclusion I agree with the reasoning @brunoapimentel provided and therefore we should continue with the original approach, thanks for the deep investigation @brunoapimentel! I will follow up by creating an issue tracking the future refactor to host the go.work feature improvement once this is ready to be merged.

@brunoapimentel brunoapimentel force-pushed the workspaces branch 4 times, most recently from 37620e5 to 5476d23 Compare June 19, 2024 10:51
"""

app_dir = RootedPath("/path/to/project")
version_resolver.get_golang_version = lambda _, __: "1.0.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any better way to mock the result of this function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff --git a/tests/unit/package_managers/test_gomod.py b/tests/unit/package_managers/test_gomod.py
index 5f617d33..7362f851 100644
--- a/tests/unit/package_managers/test_gomod.py
+++ b/tests/unit/package_managers/test_gomod.py
@@ -608,7 +608,7 @@ def test_parse_local_modules(go: mock.Mock, version_resolver: mock.Mock) -> None
     """
 
     app_dir = RootedPath("/path/to/project")
-    version_resolver.get_golang_version = lambda _, __: "1.0.0"
+    version_resolver.get_golang_version.return_value = "1.0.0"
 
     main_module, workspace_modules = _parse_local_modules(go, [], {}, app_dir, version_resolver)

Copy link
Member

@eskultety eskultety left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brunoapimentel I think we're good. A couple of trivial things, but I trust you on fixing them :).

Comment on lines 815 to 840
tmp_path: Path,
) -> None:
mock_run.return_value = path_to_go_work_file.substitute({"tmp_path": tmp_path})

repo_root = RootedPath(tmp_path)

go_work_path = _get_go_work_path(repo_root)

if should_return_none:
assert go_work_path is None
else:
assert go_work_path == repo_root


@mock.patch("cachi2.core.package_managers.gomod.Go.__call__")
def test_get_go_work_path_when_go_work_is_outside_of_repo(
mock_run: mock.Mock, tmp_path: Path
) -> None:
mock_run.return_value = "/a/random/path/go.work"

repo_root = RootedPath(tmp_path)

error_message = f"Joining path '/a/random/path' to '{tmp_path}': target is outside '{tmp_path}'"

with pytest.raises(PathOutsideRoot, match=error_message):
_get_go_work_path(repo_root)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: You can use our rooted_tmp_path instead and use it directly in all calls/accesses.

Comment on lines 249 to 253
offset = 0
if has_workspaces:
assert mock_run.call_args_list[0][0][0] == [GO_CMD_PATH, "work", "edit", "-json"]
# one extra Go command is called when workspaces are present
offset = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on how you set up parametrize, has_workspaces cannot be true at the same time with force_gomod_tidy, so the offset variable isn't needed at all.

Copy link
Contributor

@chmeliik chmeliik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that version detection works when the workspace modules are tagged with different version tags, this LGTM

cachi2/core/package_managers/gomod.py Show resolved Hide resolved
Comment on lines 976 to 993
version=main_module_version,
replace=replaced_module,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By setting replace and giving the replaced_module a relative path, the version eventually ends up getting detected by version_resolver.get_golang_version(workspace_module_name, workspace_module_path)?

(It's a little confusing, but should be correct. Maybe just don't set version=main_module_version to reduce confusion)

Copy link
Contributor

@chmeliik chmeliik Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, it does work. Would be worth adding to the integration test (can be a follow-up)

git clone https://github.com/cachito-testing/cachi2-gomod
cd cachi2-gomod
git checkout go_workspaces

git tag workspace_modules/hello/v1.1.0
git tag workspace_modules/hi/hiii/v1.2.0
git tag workspace_modules/echo/v4.2.0

cachi2 --log-level debug fetch-deps '{"type": "gomod", "path": "workspace_modules/hello"}'
jq < cachi2-output/bom.json '.components[].purl | select(test("workspace_modules.*type=module"))' -r
pkg:golang/github.com/cachito-testing/cachi2-gomod/workspace_modules/[email protected]?type=module
pkg:golang/github.com/cachito-testing/cachi2-gomod/workspace_modules/[email protected]?type=module
pkg:golang/github.com/cachito-testing/cachi2-gomod/workspace_modules/hi/[email protected]?type=module

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll implement the integration test as a follow up, then.

tests/integration/test_gomod.py Show resolved Hide resolved
ejegrova and others added 3 commits June 21, 2024 14:21
Whenever workspaces are enabled, the "go list -m" command will return a
list of all workspaces modules instead of the usual single module
present in the path being processed by Cachi2.

For this reason, we need to properly parse this extra data so that they
can be included in the resulting SBOM.

Signed-off-by: ejegrova <[email protected]>
All go.sum files and go.work.sum are checked for checksums. If not
found, the property cachi2:missing_hash:in_file has value of
go.work.sum.

Signed-off-by: Bruno Pimentel <[email protected]>
@brunoapimentel brunoapimentel added this pull request to the merge queue Jun 21, 2024
Merged via the queue into containerbuildsystem:main with commit 5292603 Jun 21, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants