Skip to content

Commit

Permalink
add runaway tidb-side time checker
Browse files Browse the repository at this point in the history
Signed-off-by: husharp <[email protected]>
  • Loading branch information
HuSharp committed Jul 29, 2024
1 parent 98b7858 commit bd80d46
Show file tree
Hide file tree
Showing 6 changed files with 48 additions and 5 deletions.
8 changes: 4 additions & 4 deletions docs/design/2023-08-24-background-tasks-control.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

Resource control is used to solve some problems of resource usage under data consolidation. We can currently control some normal query tasks by means of RU limiter and scheduling. But it's not an adaptation for some background or bulk import/export tasks very well.

Due to the implementation restriction, resource control can't be applied for some tasks such as BR and TiDB Lightning. And for some long-running tasks such as DDL or background auto-analyze, it's also hard to control the resource usage becase it's not easy to select a proper RU settrings for these kind of jobs.
Due to the implementation restriction, resource control can't be applied for some tasks such as BR and TiDB Lightning. And for some long-running tasks such as DDL or background auto-analyze, it's also hard to control the resource usage because it's not easy to select a proper RU settings for these kind of jobs.

## Design Goals

Expand All @@ -35,7 +35,7 @@ CREATE/ALTER RESOURCE GROUP rg1
[ BACKGROUND = ( TASK_TYPES = "br,analyze" ) ];
```

Currently, we only support set the task types that should be controlled in the background manner. We may extend this interface to include more setttings such as task priority in the future.
Currently, we only support set the task types that should be controlled in the background manner. We may extend this interface to include more settings such as task priority in the future.

If a resource group's background setting is not set, we automatically apply the `default` resource group's settings to this group.

Expand All @@ -55,7 +55,7 @@ In order to control the background tasks' resource usage, we plan to add an extr

![background-control.png](imgs/background-control.png)

- Control the resource usage of all background tasks by the Resource Limiter: The rate limit is dynamically adjusted to the value via the formula TiKVTotalRUCapcity - sum(RUCostRateOfForgroundTasks), with a fine-grained adjusting duration, we can ensure the foreground tasks' RU is always enough(or near the system's maximum if the foreground requirement reaches the maximum quota), so the background tasks' impact on foreground tasks should be very low; on the other hand, when the foreground resource consumption is low, the controller should increase the limit threshold, so background jobs can take advantage of the remaining resources.
- Control the resource usage of all background tasks by the Resource Limiter: The rate limit is dynamically adjusted to the value via the formula TiKVTotalRUCapacity - sum(RUCostRateOfForegroundTasks), with a fine-grained adjusting duration, we can ensure the foreground tasks' RU is always enough(or near the system's maximum if the foreground requirement reaches the maximum quota), so the background tasks' impact on foreground tasks should be very low; on the other hand, when the foreground resource consumption is low, the controller should increase the limit threshold, so background jobs can take advantage of the remaining resources.
- The local resource manager will statics RU consumption of background jobs via the Resource Limiter: We will do statistics and report the resource consumption to the global resource manager. In the first stage, we only do statistics globally but control it locally.
- Feedback mechanism: It's better to give feedback on how fast the limiter layer executes tasks on tikv to the upper layer like tidb, so that the upper layer task framework can adjust the number of tasks.

Expand Down Expand Up @@ -134,7 +134,7 @@ impl Future for LimitedFuture {

In our implementation, we integrate this rate limiter in the following components so it can cover most use cases:

- Coprocessor. All SQL read requests are handled via the coprocessor component, this can ensure all read reuqests are covered.
- Coprocessor. All SQL read requests are handled via the coprocessor component, this can ensure all read requests are covered.
- Txn Scheduler. The write requests in tikv are handled via multiple threadpools via a pipeline manner, to make things simple, we only apply the rate limiter in the first phase, that is, the txn scheduler worker pool. Though this is not ideal, the result is acceptable in our benchmark. We may enhance this mechanism in the future.
- Backup. We apply the rate limiter in backup kv scan and sst upload procedure.
- SST Service. Most sst relate operations are handled via the sst service. This ensure BR, TiDB Lightning and DDL(fast mode) can be controlled.
Expand Down
33 changes: 33 additions & 0 deletions pkg/domain/resourcegroup/runaway.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ package resourcegroup

import (
"context"
"fmt"
"strings"
"sync"
"sync/atomic"
Expand Down Expand Up @@ -525,6 +526,38 @@ func (r *RunawayChecker) BeforeExecutor() error {
return nil
}

// CheckKillAction checks whether the query should be killed.
func (r *RunawayChecker) CheckKillAction() bool {
if r.setting == nil && !r.markedByWatch {
return false
}
// mark by rule
marked := r.markedByRule.Load()
if !marked {
now := time.Now()
until := r.deadline.Sub(now)
if until > 0 {
return false
}
if r.markedByRule.CompareAndSwap(false, true) {
r.markRunaway(RunawayMatchTypeIdentify, r.setting.Action, &now)
if !r.markedByWatch {
r.markQuarantine(&now)
}
}
}
// Other actions should be done in BeforeCopRequest.
if r.setting.Action != rmpb.RunawayAction_Kill {
return false
}
return true
}

// Rule returns the rule of the runaway checker.
func (r *RunawayChecker) Rule() string {
return fmt.Sprintf("execElapsedTimeMs:%s", time.Duration(r.setting.Rule.ExecElapsedTimeMs)*time.Millisecond)
}

// BeforeCopRequest checks runaway and modifies the request if necessary before sending coprocessor request.
func (r *RunawayChecker) BeforeCopRequest(req *tikvrpc.Request) error {
if r.setting == nil && !r.markedByWatch {
Expand Down
1 change: 1 addition & 0 deletions pkg/session/session.go
Original file line number Diff line number Diff line change
Expand Up @@ -1438,6 +1438,7 @@ func (s *session) SetProcessInfo(sql string, t time.Time, command byte, maxExecu
RefCountOfStmtCtx: &s.sessionVars.RefCountOfStmtCtx,
MemTracker: s.sessionVars.MemTracker,
DiskTracker: s.sessionVars.DiskTracker,
RunawayChecker: s.sessionVars.StmtCtx.RunawayChecker,
StatsInfo: plannercore.GetStatsInfo,
OOMAlarmVariablesInfo: s.getOomAlarmVariablesInfo(),
TableIDs: s.sessionVars.StmtCtx.TableIDs,
Expand Down
2 changes: 1 addition & 1 deletion pkg/ttl/cache/table.go
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ type PhysicalTable struct {
TimeColumn *model.ColumnInfo
}

// NewBasePhysicalTable create a new PhysicalTable with specific timeColunm.
// NewBasePhysicalTable create a new PhysicalTable with specific timeColumn.
func NewBasePhysicalTable(schema model.CIStr,
tbl *model.TableInfo,
partition model.CIStr,
Expand Down
7 changes: 7 additions & 0 deletions pkg/util/expensivequery/expensivequery.go
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,13 @@ func (eqh *Handle) Run() {
sm.Kill(info.ID, true, false)
}
}
if info.RunawayChecker != nil {
if info.RunawayChecker.CheckKillAction() {
logutil.BgLogger().Warn("runaway query timeout", zap.Duration("costTime", costTime), zap.String("group-name", info.ResourceGroupName),
zap.String("rule", info.RunawayChecker.Rule()), zap.String("processInfo", info.String()))
sm.Kill(info.ID, true, false)
}
}
}
threshold = atomic.LoadUint64(&variable.ExpensiveQueryTimeThreshold)
txnThreshold = atomic.LoadUint64(&variable.ExpensiveTxnTimeThreshold)
Expand Down
2 changes: 2 additions & 0 deletions pkg/util/processinfo.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ import (
"strings"
"time"

"github.com/pingcap/tidb/pkg/domain/resourcegroup"
"github.com/pingcap/tidb/pkg/parser/auth"
"github.com/pingcap/tidb/pkg/parser/mysql"
"github.com/pingcap/tidb/pkg/session/cursor"
Expand Down Expand Up @@ -51,6 +52,7 @@ type ProcessInfo struct {
RefCountOfStmtCtx *stmtctx.ReferenceCount
MemTracker *memory.Tracker
DiskTracker *disk.Tracker
RunawayChecker *resourcegroup.RunawayChecker
StatsInfo func(any) map[string]uint64
RuntimeStatsColl *execdetails.RuntimeStatsColl
User string
Expand Down

0 comments on commit bd80d46

Please sign in to comment.