Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/gopathwalk: Ignore Bazel symlinks #118

Closed
wants to merge 1 commit into from

Conversation

vincepri
Copy link

For some folks working in the upstream Kubernetes community using goimports has been problematic. One example is working with the cluster-api repository. In the past few days I've noticed a significant slowdown across multiple machines (one with 16 cores) and started debugging.

Performing a CPU profile showed where the application was spending most of its time:

Duration: 26.31s, Total samples = 1.77mins (402.59%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 104.16s, 98.35% of 105.91s total
Dropped 152 nodes (cum <= 0.53s)
Showing top 10 nodes out of 38
      flat  flat%   sum%        cum   cum%
    53.39s 50.41% 50.41%     53.39s 50.41%  runtime.pthread_cond_wait
    29.70s 28.04% 78.45%     29.70s 28.04%  runtime.pthread_cond_signal
     9.11s  8.60% 87.06%      9.11s  8.60%  runtime.pthread_mutex_unlock
     5.21s  4.92% 91.97%      5.42s  5.12%  syscall.syscall
     4.50s  4.25% 96.22%      4.50s  4.25%  runtime.traceGoSysExit
     2.20s  2.08% 98.30%      2.46s  2.32%  syscall.syscall6
     0.02s 0.019% 98.32%      7.35s  6.94%  golang.org/x/tools/internal/fastwalk.readDir
     0.01s 0.0094% 98.33%      8.30s  7.84%  golang.org/x/tools/internal/fastwalk.(*walker).doWork
     0.01s 0.0094% 98.34%      3.11s  2.94%  golang.org/x/tools/internal/fastwalk.(*walker).onDirEnt
     0.01s 0.0094% 98.35%      0.81s  0.76%  runtime.lock

After further investigation, I was able to determine that the major slowdown was on repositories that were built and tested by Bazel. Bazel adds some symlinks to temporary folders when running in a project, which goimports scans as well.

Here is some simple benchmarks:

Runs pre bazel symlink ignore

$ go get golang.org/x/tools/cmd/goimports && time goimports -d pkg/controller/noderef/noderef_controller.go
real	0m8.172s
user	0m2.828s
sys	1m22.427s

real	0m15.525s
user	0m3.285s
sys	2m6.410s

real	0m14.007s
user	0m3.185s
sys	1m55.035s

Runs post bazel symlink ignore

$ go get golang.org/x/tools/cmd/goimports && time goimports -d pkg/controller/noderef/noderef_controller.go
real	0m0.439s
user	0m0.326s
sys	0m5.528s

real	0m0.450s
user	0m0.320s
sys	0m5.288s

real	0m0.450s
user	0m0.323s
sys	0m5.324s

@gopherbot
Copy link
Contributor

This PR (HEAD: e101897) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/tools/+/182417 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link
Contributor

Message from Gobot Gobot:

Patch Set 1:

Congratulations on opening your first change. Thank you for your contribution!

Next steps:
Within the next week or so, a maintainer will review your change and provide
feedback. See https://golang.org/doc/contribute.html#review for more info and
tips to get your patch through code review.

Most changes in the Go project go through a few rounds of revision. This can be
surprising to people new to the project. The careful, iterative review process
is our way of helping mentor contributors and ensuring that their contributions
have a lasting impact.

During May-July and Nov-Jan the Go project is in a code freeze, during which
little code gets reviewed or merged. If a reviewer responds with a comment like
R=go1.11, it means that this CL will be reviewed as part of the next development
cycle. See https://golang.org/s/release for more details.


Please don’t reply on this GitHub thread. Visit golang.org/cl/182417.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Heschi Kreinick:

Patch Set 1:

I guess GOPATH is set to the root of your Bazel workspace? Have you considered checking in a .goimportsignore instead?


Please don’t reply on this GitHub thread. Visit golang.org/cl/182417.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Vince Prignano:

Patch Set 1:

Patch Set 1:

I guess GOPATH is set to the root of your Bazel workspace? Have you considered checking in a .goimportsignore instead?

In this case, GOPATH=$HOME/go and the workspace directory is $HOME/go/sigs.k8s.io/cluster-api

The issue I found with .goimportsignore is that it doesn't allow (afaik) to specify regex-style ignore, but instead it's a list of paths. The Bazel directories are temporary dirs which are randomly generated. Does that help?


Please don’t reply on this GitHub thread. Visit golang.org/cl/182417.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Vince Prignano:

Patch Set 1:

Patch Set 1:

Patch Set 1:

I guess GOPATH is set to the root of your Bazel workspace? Have you considered checking in a .goimportsignore instead?

In this case, GOPATH=$HOME/go and the workspace directory is $HOME/go/sigs.k8s.io/cluster-api

The issue I found with .goimportsignore is that it doesn't allow (afaik) to specify regex-style ignore, but instead it's a list of paths. The Bazel directories are temporary dirs which are randomly generated. Does that help?

One example from cluster-api:

bazel-bin -> /private/var/tmp/_bazel_vince/be0cbd1c3f18d98f10f1daad792debc4/execroot/main/bazel-out/darwin-fastbuild/bin/
bazel-cluster-api -> /private/var/tmp/_bazel_vince/be0cbd1c3f18d98f10f1daad792debc4/execroot/main/
bazel-genfiles -> /private/var/tmp/_bazel_vince/be0cbd1c3f18d98f10f1daad792debc4/execroot/main/bazel-out/darwin-fastbuild/bin/
bazel-out -> /private/var/tmp/_bazel_vince/be0cbd1c3f18d98f10f1daad792debc4/execroot/main/bazel-out/
bazel-testlogs -> /private/var/tmp/_bazel_vince/be0cbd1c3f18d98f10f1daad792debc4/execroot/main/bazel-out/darwin-fastbuild/testlogs/


Please don’t reply on this GitHub thread. Visit golang.org/cl/182417.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Heschi Kreinick:

Patch Set 1:

The issue I found with .goimportsignore is that it doesn't allow (afaik) to specify regex-style ignore, but instead it's a list of paths. The Bazel directories are temporary dirs which are randomly generated. Does that help?

It should be valid to put a symlink in .goimportsignore. Does adding each of bazel-out,genfiles, etc not work? When you run goimports -v you should see a line like "Directory added to ignore list: bazel-out", which indicates it's successfully statted the symlink and added it to the list of things to not enter in shouldTraverse.


Please don’t reply on this GitHub thread. Visit golang.org/cl/182417.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Vince Prignano:

Patch Set 1:

Patch Set 1:

The issue I found with .goimportsignore is that it doesn't allow (afaik) to specify regex-style ignore, but instead it's a list of paths. The Bazel directories are temporary dirs which are randomly generated. Does that help?

It should be valid to put a symlink in .goimportsignore. Does adding each of bazel-out,genfiles, etc not work? When you run goimports -v you should see a line like "Directory added to ignore list: bazel-out", which indicates it's successfully statted the symlink and added it to the list of things to not enter in shouldTraverse.

Thanks for the reply, I tried and indeed it shows that the bazel-out is ignored. That said, the filter work only in this specific repository. Working with Kubernetes project, every repository has its own Bazel workspace and bazel files. Goimports (gopathwalk) seems to scan the entire GOPATH multiple times and the bazel-* folders (in other repos) affect the overall performance as well.

Given the lack of support for .goimportsignore to exclude directories based on a pattern, this change seemed necessary.


Please don’t reply on this GitHub thread. Visit golang.org/cl/182417.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Heschi Kreinick:

Patch Set 1:

Patch Set 1:

Patch Set 1:

The issue I found with .goimportsignore is that it doesn't allow (afaik) to specify regex-style ignore, but instead it's a list of paths. The Bazel directories are temporary dirs which are randomly generated. Does that help?

It should be valid to put a symlink in .goimportsignore. Does adding each of bazel-out,genfiles, etc not work? When you run goimports -v you should see a line like "Directory added to ignore list: bazel-out", which indicates it's successfully statted the symlink and added it to the list of things to not enter in shouldTraverse.

Thanks for the reply, I tried and indeed it shows that the bazel-out is ignored. That said, the filter work only in this specific repository. Working with Kubernetes project, every repository has its own Bazel workspace and bazel files. Goimports (gopathwalk) seems to scan the entire GOPATH multiple times and the bazel-* folders (in other repos) affect the overall performance as well.

Given the lack of support for .goimportsignore to exclude directories based on a pattern, this change seemed necessary.

I would rather not add hardcoded rules for Bazel symlinks and dirs, especially since you can change them with --symlink_prefix. Please fully explain the problem you're trying to solve. Some questions I have: What does the GOPATH in use here look like, and where are the Bazel workspaces in it? Why can't you add a simple .goimportsignore to each repository? Why is this the right change vs. adding pattern support to .goimportsignore?


Please don’t reply on this GitHub thread. Visit golang.org/cl/182417.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Vince Prignano:

Patch Set 1:

Patch Set 1:

Patch Set 1:

Patch Set 1:

The issue I found with .goimportsignore is that it doesn't allow (afaik) to specify regex-style ignore, but instead it's a list of paths. The Bazel directories are temporary dirs which are randomly generated. Does that help?

It should be valid to put a symlink in .goimportsignore. Does adding each of bazel-out,genfiles, etc not work? When you run goimports -v you should see a line like "Directory added to ignore list: bazel-out", which indicates it's successfully statted the symlink and added it to the list of things to not enter in shouldTraverse.

Thanks for the reply, I tried and indeed it shows that the bazel-out is ignored. That said, the filter work only in this specific repository. Working with Kubernetes project, every repository has its own Bazel workspace and bazel files. Goimports (gopathwalk) seems to scan the entire GOPATH multiple times and the bazel-* folders (in other repos) affect the overall performance as well.

Given the lack of support for .goimportsignore to exclude directories based on a pattern, this change seemed necessary.

I would rather not add hardcoded rules for Bazel symlinks and dirs, especially since you can change them with --symlink_prefix. Please fully explain the problem you're trying to solve. Some questions I have: What does the GOPATH in use here look like, and where are the Bazel workspaces in it? Why can't you add a simple .goimportsignore to each repository? Why is this the right change vs. adding pattern support to .goimportsignore?

Hardcoded rules aren't great, fully agree.

I (and others) usually work across a number of different repositories within the GOPATH, all of which use Bazel as build tool. To give a few examples:

  • kubernetes/kubernetes
  • kubernetes-sigs/cluster-api
  • kubernetes-sigs/cluster-api-provider-aws
  • kubernetes-sigs/cluster-api-provider-gcp
  • kubernetes-sigs/cluster-api-provider-azure
  • kubernetes-sigs/cluster-api-bootstrap-provider-kubeadm
  • kubernetes-sigs/kind
  • .. and so on

The list goes on :). I can definitely write a script locally to dynamically search for the bazel-* symlinks and add them to my .goimportsignore one by one. I contributed this patch because others folks developing with Bazel project might experience the same behavior. Over time, more and more Kubernetes projects (and sub-projects) will opt-in to use bazel and the .goimportsignore will grow as well.

If you think everyone should write their own automation to exclude these symlinks, that's fine and I'm fine closing this. That said, there are already hardcoded rules (e.g. emacs symlink files) and Bazel users can benefit as well.

--vince


Please don’t reply on this GitHub thread. Visit golang.org/cl/182417.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Vince Prignano:

Patch Set 1:

Any updates on this patch?


Please don’t reply on this GitHub thread. Visit golang.org/cl/182417.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Vince Prignano:

Patch Set 1:

Patch Set 1:

Patch Set 1:

Patch Set 1:

The issue I found with .goimportsignore is that it doesn't allow (afaik) to specify regex-style ignore, but instead it's a list of paths. The Bazel directories are temporary dirs which are randomly generated. Does that help?

It should be valid to put a symlink in .goimportsignore. Does adding each of bazel-out,genfiles, etc not work? When you run goimports -v you should see a line like "Directory added to ignore list: bazel-out", which indicates it's successfully statted the symlink and added it to the list of things to not enter in shouldTraverse.

Thanks for the reply, I tried and indeed it shows that the bazel-out is ignored. That said, the filter work only in this specific repository. Working with Kubernetes project, every repository has its own Bazel workspace and bazel files. Goimports (gopathwalk) seems to scan the entire GOPATH multiple times and the bazel-* folders (in other repos) affect the overall performance as well.

Given the lack of support for .goimportsignore to exclude directories based on a pattern, this change seemed necessary.

I would rather not add hardcoded rules for Bazel symlinks and dirs, especially since you can change them with --symlink_prefix. Please fully explain the problem you're trying to solve. Some questions I have: What does the GOPATH in use here look like, and where are the Bazel workspaces in it? Why can't you add a simple .goimportsignore to each repository? Why is this the right change vs. adding pattern support to .goimportsignore?

Should we close this change? I don't have strong opinions, it seemed a good improvement from before.


Please don’t reply on this GitHub thread. Visit golang.org/cl/182417.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Heschi Kreinick:

Patch Set 1:

Sorry, I haven't had the time to get my thoughts together on this and I still haven't really been able to. https://golang.org/issue/30058 and https://golang.org/issue/30058#issuecomment-490252038 are related. I think I might prefer the idea of a per-module .goimportsignore for this case, but I really don't know.

Doesn't goimports need to read from stuff in bazel-genfiles? Or is that not necessary because some Kubernetes always copies the wrapper back?


Please don’t reply on this GitHub thread. Visit golang.org/cl/182417.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Ian Lance Taylor:

Patch Set 1:

This seems related to https://golang.org/issue/30058.


Please don’t reply on this GitHub thread. Visit golang.org/cl/182417.
After addressing review feedback, remember to publish your drafts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants