Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util/memory: warn potential deadlock for Consume in remove (#16987) #18395

Merged
merged 5 commits into from
Jul 28, 2020

Conversation

ti-srebot
Copy link
Contributor

cherry-pick #16987 to release-4.0


What problem does this PR solve?

close #16944

Problem Summary:

What is changed and how it works?

What's Changed:
Add a new func consumeNegative() which basically does the same thing as Consume(),
but it only takes negative value to ensure that no exceed action is triggered.
So the exceed checking is removed.
Replace the original Consume() in remove() with consumeNegative() because if exceed action is triggered in Consume() called by remove(), then deadlock (one double lock and two conflicting lock order cases) will happen.

How it Works:
Exceed Action in Consume() called by remove() will cause deadlock.
Besides the double lock in #16944,
I also find two conflicting lock order cases.
Both are related to Tracker.mu.Lock().

Case 1:

One mutex is LogOnExceed.mutex.Lock():

type LogOnExceed struct {
mutex sync.Mutex // For synchronization.
acted bool
ConnID uint64
logHook func(uint64)
}

The other mutex is LogOnExceed.mutex.Lock():
type Tracker struct {
mu struct {
sync.Mutex
// The children memory trackers. If the Tracker is the Global Tracker, like executor.GlobalDiskUsageTracker,
// we wouldn't maintain its children in order to avoiding mutex contention.
children []*Tracker
}

  1. LogOnExceed.mutex.Lock() -> Tracker.mu.Lock()
    LogOnExceed.mutex.Lock(), t.String():
    func (a *LogOnExceed) Action(t *Tracker) {
    a.mutex.Lock()
    defer a.mutex.Unlock()
    if !a.acted {
    a.acted = true
    if a.logHook == nil {
    logutil.BgLogger().Warn("memory exceeds quota",
    zap.Error(errMemExceedThreshold.GenWithStackByArgs(t.label, t.BytesConsumed(), t.bytesLimit, t.String())))

    Then t.String() calls t.toString().
    Tracker.mu.Lock():

    tidb/util/memory/tracker.go

    Lines 258 to 265 in 2daee41

    func (t *Tracker) toString(indent string, buffer *bytes.Buffer) {
    fmt.Fprintf(buffer, "%s\"%s\"{\n", indent, t.label)
    if t.bytesLimit > 0 {
    fmt.Fprintf(buffer, "%s \"quota\": %s\n", indent, t.BytesToString(t.bytesLimit))
    }
    fmt.Fprintf(buffer, "%s \"consumed\": %s\n", indent, t.BytesToString(t.BytesConsumed()))
    t.mu.Lock()
  2. Tracker.mu.Lock() -> LogOnExceed.mutex.Lock()
    Tracker.mu.Lock(), t.Consume():

    tidb/util/memory/tracker.go

    Lines 155 to 163 in 2daee41

    func (t *Tracker) remove(oldChild *Tracker) {
    t.mu.Lock()
    defer t.mu.Unlock()
    for i, child := range t.mu.children {
    if child != oldChild {
    continue
    }
    t.Consume(-oldChild.BytesConsumed())

    tidb/util/memory/tracker.go

    Lines 252 to 254 in 2daee41

    func (t *Tracker) String() string {
    buffer := bytes.NewBufferString("\n")
    t.toString("", buffer)
        var rootExceed *Tracker
	for tracker := t
        rootExceed = tracker

There is a path that leads to rootExceed == t.
If that is the case, Action(rootExceed) is equal to Action(t).

rootExceed.actionMu.actionOnExceed.Action(rootExceed)

Then Action() calls LogOnExceed.mutex.Lock().

func (a *LogOnExceed) Action(t *Tracker) {
a.mutex.Lock()
defer a.mutex.Unlock()

Case 2:

Similar to Case 1.

  1. Tracker.actionMu.Lock() -> Tracker.mu.Lock()
    Consume() calls rootExceed.actionMu.Lock() and Action().
    Action() calls String(). Then toString(). Then t.mu.Lock().
  2. Tracker.mu.Lock() -> Tracker.actionMu.Lock()
    remove() calls t.mu.Lock() and t.Consume().
    Then t.Consume() calls rootExceed.actionMu.Lock().

#16944 suggests that we add a new version of Consume() without checking exceeding.
I agree with this because it warns developers to ensure the input cannot cause exceeding in Consume() called by remove(). Otherwise, a deadlock may happen.
This will help prevent future double-lock in #16944 and the conflicting locks in this PR.

Related changes

  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test

Side effects

No

Release note

  • util/memory: warn potential deadlock for Consume in remove

@ti-srebot
Copy link
Contributor Author

/run-all-tests

@ti-srebot
Copy link
Contributor Author

@SunRunAway, @XuHuaiyu, @Yisaer, @wshwsh12, PTAL.

@zz-jason zz-jason modified the milestones: v4.0.2, v4.0.3 Jul 10, 2020
@ti-srebot
Copy link
Contributor Author

@SunRunAway, @XuHuaiyu, @Yisaer, @wshwsh12, PTAL.

1 similar comment
@ti-srebot
Copy link
Contributor Author

@SunRunAway, @XuHuaiyu, @Yisaer, @wshwsh12, PTAL.

@XuHuaiyu
Copy link
Contributor

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 15, 2020
Copy link
Contributor

@wshwsh12 wshwsh12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jul 15, 2020
@winoros winoros modified the milestones: v4.0.3, v4.0.4 Jul 15, 2020
@ti-srebot
Copy link
Contributor Author

@wshwsh12, @SunRunAway, @XuHuaiyu, @Yisaer, PTAL.

1 similar comment
@ti-srebot
Copy link
Contributor Author

@wshwsh12, @SunRunAway, @XuHuaiyu, @Yisaer, PTAL.

Copy link
Contributor

@Yisaer Yisaer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot
Copy link
Contributor Author

@Yisaer,Thanks for your review. However, LGTM is restricted to Reviewers or higher roles.See the corresponding SIG page for more information. Related SIGs: execution(slack).

@ti-srebot ti-srebot added status/LGT3 The PR has already had 3 LGTM. and removed status/LGT2 Indicates that a PR has LGTM 2. labels Jul 20, 2020
@SunRunAway
Copy link
Contributor

/merge

@ti-srebot
Copy link
Contributor Author

Sorry @SunRunAway, you don't have permission to trigger auto merge event on this branch.
The version release is in progress.

@ti-srebot
Copy link
Contributor Author

@wshwsh12, @Yisaer, @SunRunAway, @XuHuaiyu, PTAL.

1 similar comment
@ti-srebot
Copy link
Contributor Author

@wshwsh12, @Yisaer, @SunRunAway, @XuHuaiyu, PTAL.

@zz-jason
Copy link
Member

/merge

@ti-srebot
Copy link
Contributor Author

Sorry @zz-jason, you don't have permission to trigger auto merge event on this branch.
The version releasement is in progress.

@ti-srebot
Copy link
Contributor Author

@wshwsh12, @Yisaer, @SunRunAway, @XuHuaiyu, PTAL.

@jackysp jackysp removed the contribution This PR is from a community contributor. label Jul 27, 2020
@winoros
Copy link
Member

winoros commented Jul 28, 2020

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Jul 28, 2020
@ti-srebot
Copy link
Contributor Author

/run-all-tests

@ti-srebot ti-srebot merged commit 1117b22 into pingcap:release-4.0 Jul 28, 2020
@imtbkcat imtbkcat modified the milestones: v4.0.4, v4.0.5 Jul 28, 2020
@winoros winoros deleted the release-4.0-a9177fe846bf branch July 28, 2020 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/execution SIG execution status/can-merge Indicates a PR has been approved by a committer. status/LGT3 The PR has already had 3 LGTM. type/4.0-cherry-pick
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants