-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/trace: port to v2 traces [freeze exception] #63960
Comments
cc @golang/runtime |
Change https://go.dev/cl/540256 mentions this issue: |
Change https://go.dev/cl/540259 mentions this issue: |
@golang/release Hey release folks, I'd like to request a freeze exception for this work. It's a little early and there's a solid chance that we'll get this all done before the freeze rolls around, but I wanted to get this on your radar ASAP. My reasoning for the freeze exception is as follows: This is a blocker for enabling the new execution tracer (#60773) by default, because we shouldn't ship a runtime that produces traces that users have no tooling for. However, this codebase is far from mission-critical. If it has bugs, it's not going to break any production services. The new execution tracer on the other hand is in good shape and will easily land before the freeze. Simultaneously, because the new execution tracer work is largely targeting developers that build tools for other developers, asking downstream users to set a Note that I'm also suggesting to enable the new execution tracer by default after the freeze window. While this seems risky, I designed the new tracer's tests to run even if the I propose we also set a hard cutoff on this work at the first release candidate. If we can't get it done by the time the release candidate is being built, then we'll drop it for Go 1.22. Note that we can still land all the in-progress changes before then (to avoid carrying an inventory of unsubmitted CLs) because cmd/trace selects its behavior based on the trace version. As a result, it'll still work exactly the same for old traces; cmd/trace will only be left in a partially-working state for new traces. This further reduces the risk and provides us with an easy off-ramp. |
Approved if needed: we don't expect to get a lot of coverage of this feature before RC1 anyway, so as long as it lands before that it should be okay. However, please try to test with large-scale Google services as early as possible. |
This change adds a new MutatorUtilization for traces for Go 1.22+. To facilitate testing, it also generates a short trace with the gc-stress.go test program (shortening its duration to 10ms) and adds it to the tests for the internal/trace/v2 package. Notably, we make sure this trace has a GCMarkAssistActive event to test that codepath. For #63960. For #60773. Change-Id: I2e61f545988677be716818e2a08641c54c4c201f Reviewed-on: https://go-review.googlesource.com/c/go/+/540256 LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
For #63960. Change-Id: I1efe35435e32623aba894a915114e394570ebc56 Reviewed-on: https://go-review.googlesource.com/c/go/+/540259 Reviewed-by: Michael Pratt <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
Change https://go.dev/cl/541535 mentions this issue: |
For #63960. Change-Id: I3d8d1567c4ee213e2ffd2bd91d0ffae9c4c42b92 Reviewed-on: https://go-review.googlesource.com/c/go/+/541535 Reviewed-by: Michael Pratt <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Auto-Submit: Michael Knyszek <[email protected]> Run-TryBot: Michael Knyszek <[email protected]>
Syscall rendering now works. Instead of rendering them as instant events, they are now rendered as slices. For example: For blocking syscalls, only the time they are executing on a P (before sysmon declares them "blocking") is displayed on the P lane. The Arg label "blocked" indicates that the syscall got blocked: While typing this I realize that it might be nice to display the |
Change https://go.dev/cl/541257 mentions this issue: |
Change https://go.dev/cl/541259 mentions this issue: |
Change https://go.dev/cl/541258 mentions this issue: |
Change https://go.dev/cl/541999 mentions this issue: |
Change https://go.dev/cl/541998 mentions this issue: |
Change https://go.dev/cl/542076 mentions this issue: |
Change https://go.dev/cl/542077 mentions this issue: |
Change https://go.dev/cl/542001 mentions this issue: |
Change https://go.dev/cl/542218 mentions this issue: |
Change https://go.dev/cl/543019 mentions this issue: |
Change https://go.dev/cl/543595 mentions this issue: |
Change https://go.dev/cl/543596 mentions this issue: |
Change https://go.dev/cl/543695 mentions this issue: |
Change https://go.dev/cl/543937 mentions this issue: |
Change https://go.dev/cl/543995 mentions this issue: |
This change refactors the cmd/trace package and adds most of the support for v2 traces. The following features of note are missing in this CL and will be implemented in follow-up CLs: - The focustask filter for the trace viewer - The taskid filter for the trace viewer - The goid filter for the trace viewer - Pprof profiles - The MMU graph - The goroutine analysis pages - The task analysis pages - The region analysis pages This CL makes one notable change to the trace CLI: it makes the -d flag accept an integer to set the debug mode. For old traces -d != 0 works just like -d. For new traces -d=1 means the high-level events and -d=2 means the low-level events. Thanks to Felix Geisendörfer ([email protected]) for doing a lot of work on this CL; I picked this up from him and got a massive headstart as a result. For #60773. For #63960. Change-Id: I3626e22473227c5980134a85f1bb6a845f567b1b Reviewed-on: https://go-review.googlesource.com/c/go/+/542218 Reviewed-by: Michael Pratt <[email protected]> Auto-Submit: Michael Knyszek <[email protected]> TryBot-Bypass: Michael Knyszek <[email protected]>
This is a complete fork and most of a rewrite of the goroutine analysis pages for v2 traces. It fixes an issue with the old page where GC time didn't really make any sense, generalizes the page and breaks things down further, and adds clarifying text. This change also modifies the SummarizeGoroutines API to not stream the trace. This is unfortunate, but we're already reading and holding the entire trace in memory for the trace viewer. We can revisit this decision in the future. Also, we want to do this now because the GoroutineSummary holds on to pointers to events, and these events will be used by the user region and user task analyses. While tracev2 events are values and they should be equivalent no matter how many times we parse a trace, this lets us reference the event in the slice directly. For #60773. For #63960. Fixes #62443. Change-Id: I1c5ab68141869378843f4f2826686038e4533090 Reviewed-on: https://go-review.googlesource.com/c/go/+/541257 Reviewed-by: Michael Pratt <[email protected]> Auto-Submit: Michael Knyszek <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
This change moves the MMU HTTP handlers and functionality into the traceviewer package, since unlike the goroutine pages the vast majority of that functionality is identical between v1 and v2. This change involves some refactoring so that callers can plug in their own mutator utilization computation functions (which is the only point of difference between v1 and v2). The new interface isn't especially nice, but part of the problem is the MMU handlers depend on specific endpoints to exist. A follow-up CL will clean this up a bit. Like the previous CL did for goroutine analysis, modify the v2 mutator utilization API to accept a slice of trace events. Again, we might as well reuse what was already parsed and will be needed for other purposes. It also simplifies the API slightly. For #60773. For #63960. Change-Id: I6c21ec8d1bf7e95eff5363d0e0005c9217fa00e7 Reviewed-on: https://go-review.googlesource.com/c/go/+/541258 LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Michael Pratt <[email protected]> Auto-Submit: Michael Knyszek <[email protected]>
The last change made the MMU rendering code common and introduced a new API, but it was kind of messy. Part of the problem was that some of the Javascript in the template for the main page referred to specific endpoints on the server. Fix this by having the Javascript access the same endpoint but with a different query variable. Now the Javascript code doesn't depend on specific endpoints, just on query variables for the current endpoint. For #60773. For #63960. Change-Id: I1c559d9859c3a0d62e2094c9d4ab117890b63b31 Reviewed-on: https://go-review.googlesource.com/c/go/+/541259 Auto-Submit: Michael Knyszek <[email protected]> Reviewed-by: Michael Pratt <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
For #60773. For #63960. Change-Id: Id97380f19267ec765b25a703ea3e2f284396ad75 Reviewed-on: https://go-review.googlesource.com/c/go/+/541998 Auto-Submit: Michael Knyszek <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
This change adds support for the pprof endpoints to cmd/trace/v2. In the process, I realized we need to pass the goroutine summaries to more places, and previous CLs had already done the goroutine analysis during cmd/trace startup. This change thus refactors the goroutine analysis API once again to operate in a streaming manner, and to run at the same time as the initial trace parsing. Now we can include it in the parsedTrace type and pass that around as the de-facto global trace context. Note: for simplicity, this change redefines "syscall" profiles to capture *all* syscalls, not just syscalls that block. IIUC, this choice was partly the result of a limitation in the previous trace format that syscalls don't all have complete durations and many short syscalls are treated as instant. To this end, this change modifies the text on the main trace webpage to reflect this change. For #60773. For #63960. Change-Id: I601d9250ab0849a0bfaef233fd9b1e81aca9a22a Reviewed-on: https://go-review.googlesource.com/c/go/+/541999 Auto-Submit: Michael Knyszek <[email protected]> Reviewed-by: Michael Pratt <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
The v2 trace parser currently handles task inheritance and region task association incorrectly. It assumes that a TaskID of 0 means that there is no task. However, this is only true for task events. A TaskID of 0 means that a region gets assigned to the "background task." The parser currently has no concept of a "background task." Fix this by defining the background task as task ID 0 and redefining NoTask to ^uint64(0). This aligns the TaskID values more closely with other IDs in the parser and also enables disambiguating these two cases. For #60773. For #63960. Change-Id: I09c8217b33b87c8f8f8ea3b0203ed83fd3b61e11 Reviewed-on: https://go-review.googlesource.com/c/go/+/543019 Reviewed-by: Michael Pratt <[email protected]> Auto-Submit: Michael Knyszek <[email protected]> TryBot-Bypass: Michael Knyszek <[email protected]>
For v1 traces, cmd/trace contains code for analyzing tasks separately from the goroutine analysis code present in internal/trace. As I started to look into porting that code to v2 traces, I noticed that it wouldn't be too hard to just generalize the existing v2 goroutine summary code to generate exactly the same information. This change does exactly that. For #60773. For #63960. Change-Id: I0cdd9bf9ba11fb292a9ffc37dbf18c2a6a2483b8 Reviewed-on: https://go-review.googlesource.com/c/go/+/542076 Auto-Submit: Michael Knyszek <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
This code will be useful for the new tracer, and there's no need to duplicate it. This change copies it to internal/trace/traceviewer, adds some comments, and renames it to TimeHistogram. While we're here, let's get rid of the unused String method which has a comment talking about how awful the rendering is. Also, let's get rid of uses of niceDuration. We'd have to bring it with us in the move and I don't think it's worth it. The difference between the default time.Duration rendering and the niceDuration rendering is usually a few extra digits of precision. Yes, it's noisier, but AFAICT it's not substantially worse. It doesn't seem worth the new API, even if it's just internal. We can also always bring it back later. For #60773. For #63960. Change-Id: I795f58f579f1d503c540c3a40bed12e52bce38e2 Reviewed-on: https://go-review.googlesource.com/c/go/+/542001 LUCI-TryBot-Result: Go LUCI <[email protected]> Auto-Submit: Michael Knyszek <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
This change fills out the last of cmd/trace's subpages for v2 traces by adding support for task and region endpoints. For #60773. For #63960. Change-Id: Ifc4c660514b3904788785a1b20e3abc3bb9e55f1 Reviewed-on: https://go-review.googlesource.com/c/go/+/542077 Reviewed-by: Michael Pratt <[email protected]> Auto-Submit: Michael Knyszek <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
This change adds support for the trace?goid=<goid> endpoint to the trace tool for v2 traces. In effect, this change actually implements a per-goroutine view. I tried to add a link to the main page to enable a "view by goroutines" view without filtering, but the web trace viewer broke the browser tab when there were a few hundred goroutines. The risk of a browser hang probably isn't worth the cases where this is nice, especially since filtering by goroutine already works. Unfortunate, but c'est l'vie. Might be worth revisiting if we change out the web viewer in the future. For #60773. For #63960. Change-Id: I8e29f4ab8346af6708fd8824505c30f2c43db796 Reviewed-on: https://go-review.googlesource.com/c/go/+/543595 TryBot-Bypass: Michael Knyszek <[email protected]> Reviewed-by: Michael Pratt <[email protected]> Auto-Submit: Michael Knyszek <[email protected]>
This change implements support for the trace?focustask=<taskid> endpoint in the trace tool for v2 traces. Note: the one missing feature in v2 vs. v1 is that the "irrelevant" (but still rendered) events are not grayed out. This basically includes events that overlapped with events that overlapped with other events that were in the task time period, but aren't themselves directly associated. This is probably fine -- the UI already puts a very obvious focus on the period of time the selected task was running. For #60773. For #63960. Change-Id: I5c78a220ae816e331b74cb67c01c5cd98be40dd4 Reviewed-on: https://go-review.googlesource.com/c/go/+/543596 Auto-Submit: Michael Knyszek <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
This is a nice-to-have that's now straightforward to do with the new trace format. This change adds a new query variable passed to the /trace endpoint called "view," which indicates the type of view to use. It is orthogonal with task-related views. Unfortunately a goroutine-based view isn't included because it's too likely to cause the browser tab to crash. For #60773. For #63960. Change-Id: Ifbcb8f2d58ffd425819bdb09c586819cb786478d Reviewed-on: https://go-review.googlesource.com/c/go/+/543695 LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Michael Pratt <[email protected]> Auto-Submit: Michael Knyszek <[email protected]>
This change adds support for a goroutine-oriented task view via the /trace?taskid=<taskid> endpoint. This works but it's missing regions. That will be implemented in a follow-up CL. For #60773. For #63960. Change-Id: I086694143e5c71596ac22fc551416868f0b80923 Reviewed-on: https://go-review.googlesource.com/c/go/+/543937 LUCI-TryBot-Result: Go LUCI <[email protected]> Auto-Submit: Michael Knyszek <[email protected]> Reviewed-by: Michael Pratt <[email protected]>
Change https://go.dev/cl/546026 mentions this issue: |
For #60773. For #62627. For #63960. For #61422. Change-Id: I3c933f7522f65cd36d11d38a268556d92c8053f9 Reviewed-on: https://go-review.googlesource.com/c/go/+/546026 Reviewed-by: Michael Pratt <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Auto-Submit: Michael Knyszek <[email protected]>
For golang#60773. For golang#62627. For golang#63960. For golang#61422. Change-Id: I3c933f7522f65cd36d11d38a268556d92c8053f9 Reviewed-on: https://go-review.googlesource.com/c/go/+/546026 Reviewed-by: Michael Pratt <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Auto-Submit: Michael Knyszek <[email protected]>
Tracking issue for updating cmd/trace for the v2 overhaul (#60773) in CL 538515.
Main Viewer:
totalDuration
arg for blocking syscalls (nice-to-have, see comment)Sub-Pages:
cc @mknyszek
The text was updated successfully, but these errors were encountered: