-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: repair Go stack traces #3014
Conversation
// | ||
// For each stack trace S taller than 24 frames: if there is another | ||
// stack trace R taller than 24 frames that overlaps with the given | ||
// one by at least 16 frames in a row from the top, and has frames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 16 and not less ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a reasonable overlap between stack traces. Half of the max looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should test in dev with different value of that configuration just to see the impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less overlap size implies more tokens. The more tokens, the more checks we need to perform, the procedure becomes more expensive. In the meantime, with a shorter overlap, the chance that we will "repair" a wrong stack is higher. Therefore I'm not sure if it's worth it.
I experimented with various values and can say that the current ones work well. Thus, I would prefer to deploy this version and optimize it if we deem it necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one.
Quite hard to follow but I don't know if there's anything we can do more than your current comment.
for _, s := range p.Sample { | ||
if len(s.LocationId) >= minDepth { | ||
return true | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://go-review.googlesource.com/c/go/+/572396: we also need to check that the depth does not exceed the limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think the fact that truncation has happened could be propagated more clearly: golang/go#43669
The PR suggests a method for fixing truncated Go heap profile stack traces. The function attempts to infer truncated frames by analyzing neighbors and then repairs the stack traces if there is a high likelihood that the missing part exists in the profile but in a different stack. I manually reviewed a few cases and can confidently say that it produces reliable and valid results. However, it's not immune to errors. In my opinion, any potential mistakes, if they take place, do not cause much harm to a malformed profile (which Go heap profiles often are). Therefore, I consider it safe.
The performance impact of the fix is relatively moderate; approximately 75% of the CPU time is dedicated to sorting, and there are no substantial allocations.
Interestingly, the correction of stack traces does not lead to a significant change in their cardinality (-5-15%). However, this reduces pressure on the symdb, as we expect a tree-like structure in call chains, which is often broken when truncated.
More important, from the UX standpoint, Go heap profiles become way more clear.
Before:
After:
Before:
After: