Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: repair Go stack traces #3014

Merged
merged 6 commits into from
Feb 27, 2024
Merged

feat: repair Go stack traces #3014

merged 6 commits into from
Feb 27, 2024

Conversation

kolesnikovae
Copy link
Collaborator

@kolesnikovae kolesnikovae commented Feb 21, 2024

The PR suggests a method for fixing truncated Go heap profile stack traces. The function attempts to infer truncated frames by analyzing neighbors and then repairs the stack traces if there is a high likelihood that the missing part exists in the profile but in a different stack. I manually reviewed a few cases and can confidently say that it produces reliable and valid results. However, it's not immune to errors. In my opinion, any potential mistakes, if they take place, do not cause much harm to a malformed profile (which Go heap profiles often are). Therefore, I consider it safe.

The performance impact of the fix is relatively moderate; approximately 75% of the CPU time is dedicated to sorting, and there are no substantial allocations.

Interestingly, the correction of stack traces does not lead to a significant change in their cardinality (-5-15%). However, this reduces pressure on the symdb, as we expect a tree-like structure in call chains, which is often broken when truncated.

More important, from the UX standpoint, Go heap profiles become way more clear.

Before:
image

After:
image


Before:
image

After:
image

@kolesnikovae kolesnikovae marked this pull request as ready for review February 21, 2024 08:50
@kolesnikovae kolesnikovae requested a review from a team as a code owner February 21, 2024 08:50
//
// For each stack trace S taller than 24 frames: if there is another
// stack trace R taller than 24 frames that overlaps with the given
// one by at least 16 frames in a row from the top, and has frames
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 16 and not less ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a reasonable overlap between stack traces. Half of the max looks good to me.

Copy link
Contributor

@cyriltovena cyriltovena Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should test in dev with different value of that configuration just to see the impact.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less overlap size implies more tokens. The more tokens, the more checks we need to perform, the procedure becomes more expensive. In the meantime, with a shorter overlap, the chance that we will "repair" a wrong stack is higher. Therefore I'm not sure if it's worth it.

I experimented with various values and can say that the current ones work well. Thus, I would prefer to deploy this version and optimize it if we deem it necessary.

Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one.

Quite hard to follow but I don't know if there's anything we can do more than your current comment.

@kolesnikovae kolesnikovae merged commit c286ca2 into main Feb 27, 2024
19 checks passed
@kolesnikovae kolesnikovae deleted the feat/repair-go-stacks branch February 27, 2024 10:42
Comment on lines +33 to +37
for _, s := range p.Sample {
if len(s.LocationId) >= minDepth {
return true
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://go-review.googlesource.com/c/go/+/572396: we also need to check that the depth does not exceed the limit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think the fact that truncation has happened could be propagated more clearly: golang/go#43669

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants