Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous profiling causes SQL P999 duration jitter #48695

Closed
djshow832 opened this issue Nov 19, 2023 · 6 comments
Closed

Continuous profiling causes SQL P999 duration jitter #48695

djshow832 opened this issue Nov 19, 2023 · 6 comments
Assignees
Labels
affects-5.4 This bug affects 5.4.x versions. affects-6.1 affects-6.5 affects-7.1 affects-7.5 may-affects-5.3 This bug maybe affects 5.3.x versions. severity/major sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.

Comments

@djshow832
Copy link
Contributor

djshow832 commented Nov 19, 2023

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

  1. Enable continuous profiling on TiDB
  2. Create 5k connections to TiDB
  3. Send queries of about 1k QPS
  4. Watch the P999 duration of queries

2. What did you expect to see? (Required)

No jitter

3. What did you see instead (Required)

Some duration jitters, including compiling duration, get TSO duration, etc.
P999 sometimes up to 100ms or above.
The jitter happens one time every minute, which matches the frequency of continuous profiling.
image

4. What is your TiDB version? (Required)

v6.5.4

@djshow832 djshow832 added type/bug The issue is confirmed as a bug. sig/sql-infra SIG: SQL Infra severity/major labels Nov 19, 2023
@ti-chi-bot ti-chi-bot bot added may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 may-affects-7.5 labels Nov 19, 2023
@djshow832 djshow832 changed the title Continuous profiling causes P999 jitter Continuous profiling causes SQL P999 duration jitter Nov 19, 2023
@djshow832 djshow832 self-assigned this Nov 24, 2023
@djshow832
Copy link
Contributor Author

djshow832 commented Nov 25, 2023

I can stably reproduce this on both v6.5.4 and v7.4.0. It only occurs when:

  • There are thousands of connections (or goroutines), even if they are idle.
  • Continuous profiling instead of manual profiling.

image

@djshow832
Copy link
Contributor Author

djshow832 commented Nov 25, 2023

When collecting the goroutines, there are 4 STW(all goroutines stack trace) events that take totally around 300ms:
image
image

@djshow832
Copy link
Contributor Author

djshow832 commented Nov 25, 2023

This may be because of golang/go#33250, which has been fixed in https://go-review.googlesource.com/c/go/+/387415 by doing less work during STW.
However, TiDB v7.4.0 uses go 1.21.0 and has this PR merged, but this issue still happens.

@djshow832
Copy link
Contributor Author

djshow832 commented Nov 25, 2023

With manual profiling, I can see an STW(goroutine profile) of around 1ms at the beginning of goroutineProfileWithLabelsConcurrent, which is expected.
But I didn't see STW(all goroutines stack trace) as the continuous profiling shows. This kind of STW is only called by runtime.Stack([]byte, bool).

image

@djshow832
Copy link
Contributor Author

When I fetch the goroutines with ?debug=2, it becomes quite slow and the trace shows STW(all goroutines stack trace). This is because it fetches goroutine stacks instead of calling runtime_goroutineProfileWithLabels.
https://github.com/golang/go/blob/go1.21.0/src/runtime/pprof/pprof.go#L692

@djshow832
Copy link
Contributor Author

I have tested with the nightly TiDB cluster and the bug disappeared.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.4 This bug affects 5.4.x versions. affects-6.1 affects-6.5 affects-7.1 affects-7.5 may-affects-5.3 This bug maybe affects 5.3.x versions. severity/major sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

2 participants