sql/opt/exec: add implicit SELECT FOR UPDATE support for UPDATE statements #45159

nvanbenschoten · 2020-02-18T06:55:49Z

This commit adds support for implicitly applying FOR UPDATE row-level locking modes to the initial row scan of UPDATE statements.

Conceptually, if we picture an UPDATE statement as the composition of a SELECT statement and an INSERT statement (with loosened semantics around existing rows) then this change performs the following transformation:

UPDATE t = SELECT FROM t + INSERT INTO t
=>
UPDATE t = SELECT FROM t FOR UPDATE + INSERT INTO t

The transformation is conditional on the UPDATE expression tree matching a pattern. Specifically, the FOR UPDATE locking mode is only used during the initial row scan when all row filters have been pushed into the ScanExpr. If the statement includes any filters that cannot be pushed into the scan then no row-level locking mode is applied. The rationale here is that FOR UPDATE locking is not necessary for correctness due to serializable isolation, so it is strictly a performance optimization for contended writes. Therefore, it is not worth risking the transformation being a pessimization, so it is only applied when doing so does not risk creating artificial contention.

The change introduces a new enable_implicit_select_for_update session variable that controls whether this transformation is applied to mutation statements. It also introduces a sql.defaults.implicit_select_for_update.enabled cluster setting to serve as the default value for the session variable.

The locking mode is still ignored by the key-value layer, but that will change in the next few days. Once that happens, we'll be able to gather performance numbers (past what's already been posted in #43775) about the performance impact this change has on uncontended and contended workloads.

Release note (sql change): UPDATE statements now acquire locks using the FOR UPDATE locking mode during their initial row scan, which improves performance for contended workloads. This behavior is configurable using the enable_implicit_select_for_update session variable and the sql.defaults.implicit_select_for_update.enabled cluster setting.

cockroach-teamcity · 2020-02-18T06:55:56Z

This change is

andy-kimball

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @RaduBerinde)

pkg/sql/opt/exec/execbuilder/mutation.go, line 757 at r1 (raw file):

// not worth risking the transformation being a pessimization, so it is only
// applied when doing so does not risk creating artificial contention.
func (b *Builder) tryApplyImplicitLockingToUpdateInput(upd *memo.UpdateExpr) memo.RelExpr {

Why would this only be for Update? Why not call it tryApplyImplicitLockingToMutationInput, and re-use it for Upsert and Delete as well? In that case, I'd expect the parameter to be (input memo.RelExpr) instead. I'd also expect to match the case where there's no Project wrapper (though presumably only the Delete case would trigger this).

I think if we expect there to be even more complex matching scenarios in the future (like to handle DistinctOn and LeftJoin matching for Upsert cases), that'd probably push me back towards making these exploration rules...

Curious to hear @RaduBerinde's opinion.

pkg/sql/opt/exec/execbuilder/mutation.go, line 787 at r1 (raw file):

	}

	// Update the expression tree, making sure to treat is as immutable.

Allocating new nodes here is interesting, though I worry about breaking the memo.RelExpr contract - the FirstExpr and NextExpr methods won't work quite right after this. It's true that today the execbuilder never calls those methods, and really should never call them in the future. Maybe add a comment to call this new pattern out.

RaduBerinde

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @nvanbenschoten, and @RaduBerinde)

pkg/sql/opt/exec/execbuilder/mutation.go, line 757 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Why would this only be for Update? Why not call it tryApplyImplicitLockingToMutationInput, and re-use it for Upsert and Delete as well? In that case, I'd expect the parameter to be (input memo.RelExpr) instead. I'd also expect to match the case where there's no Project wrapper (though presumably only the Delete case would trigger this).

I think if we expect there to be even more complex matching scenarios in the future (like to handle DistinctOn and LeftJoin matching for Upsert cases), that'd probably push me back towards making these exploration rules...

Curious to hear @RaduBerinde's opinion.

I think I'd move forward with this approach for now and revisit later when we know for sure how complex it gets. At that point we can also benchmark any new approach against this and see if it's worth it.

It's also conceivable that we'd add some optgen infratructure to help with this (execbuilder "rules" would have a "match" part as usual and the replace part would be just a function call that sets some flag in the Builder).

pkg/sql/opt/exec/execbuilder/mutation.go, line 775 at r1 (raw file):

	//
	// If we do, apply a locking clause to the ScanExpr.
	proj, ok := upd.Input.(*memo.ProjectExpr)

Why do we require the Project? It isn't possible to have just the Scan?

pkg/sql/opt/exec/execbuilder/mutation.go, line 787 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Allocating new nodes here is interesting, though I worry about breaking the memo.RelExpr contract - the FirstExpr and NextExpr methods won't work quite right after this. It's true that today the execbuilder never calls those methods, and really should never call them in the future. Maybe add a comment to call this new pattern out.

Yeah I am not a fan of creating fake expressions like this. We could add a forceLocking flag to the Builder which later is checked by buildScan.

BTW, to make the opt plans more suggestive of what's going on, consider adding an "auto" locking mode which is used for the scans created for mutations (and the flag I mentioned would only apply when it's "auto").

andy-kimball

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)

pkg/sql/opt/exec/execbuilder/mutation.go, line 757 at r1 (raw file):

Previously, RaduBerinde wrote…

I think I'd move forward with this approach for now and revisit later when we know for sure how complex it gets. At that point we can also benchmark any new approach against this and see if it's worth it.

It's also conceivable that we'd add some optgen infratructure to help with this (execbuilder "rules" would have a "match" part as usual and the replace part would be just a function call that sets some flag in the Builder).

Nathan explained that the patterns we want to match for are different for each case:

Update is Project => Scan
Delete is Scan
Upsert is Project => LeftJoin => Scan

So combining them isn't too useful.

pkg/sql/opt/exec/execbuilder/mutation.go, line 775 at r1 (raw file):

Previously, RaduBerinde wrote…

Why do we require the Project? It isn't possible to have just the Scan?

I think we pretty much always have a Project, because we need to project the update values, even if they're just constants. One case I thought of where the Project would not be present would be something like:

UPDATE t SET y=x WHERE x>1 AND x<10;

That is, setting one column to another, since there's no Project needed in that case. But it's probably pretty rare.

pkg/sql/opt/exec/execbuilder/mutation.go, line 787 at r1 (raw file):

Previously, RaduBerinde wrote…

Yeah I am not a fan of creating fake expressions like this. We could add a forceLocking flag to the Builder which later is checked by buildScan.

BTW, to make the opt plans more suggestive of what's going on, consider adding an "auto" locking mode which is used for the scans created for mutations (and the flag I mentioned would only apply when it's "auto").

Something like this?

func (b *Builder) buildUpdateInput(
	inputExpr memo.RelExpr, colList opt.ColList, p *memo.MutationPrivate,
) (execPlan, error) {
  if the_pattern_matches {
    b.forceLocking = true
    plan, err := b.buildMutationInput(inputExpr, colList, p)
    b.forceLocking = false
    return plan, err
  }
  return b.buildMutationInput(inputExpr, colList, p)

RaduBerinde

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @RaduBerinde)

pkg/sql/opt/exec/execbuilder/mutation.go, line 787 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Something like this?

func (b *Builder) buildUpdateInput(
	inputExpr memo.RelExpr, colList opt.ColList, p *memo.MutationPrivate,
) (execPlan, error) {
  if the_pattern_matches {
    b.forceLocking = true
    plan, err := b.buildMutationInput(inputExpr, colList, p)
    b.forceLocking = false
    return plan, err
  }
  return b.buildMutationInput(inputExpr, colList, p)

Yes, exactly. Or just

if pattern_matches {
  b.forceLocking = true
  defer func() { b.forceLocking = false }()
}

nvanbenschoten

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @RaduBerinde)

pkg/sql/opt/exec/execbuilder/mutation.go, line 757 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Nathan explained that the patterns we want to match for are different for each case:
Update is Project => Scan
Delete is Scan
Upsert is Project => LeftJoin => Scan
So combining them isn't too useful.

Exactly, combining them wouldn't really help because we want to match such varied patterns for each case. So a single tryApplyImplicitLockingToMutationInput would need to branch off on each of the top-level expression types immediately anyway.

pkg/sql/opt/exec/execbuilder/mutation.go, line 775 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

I think we pretty much always have a Project, because we need to project the update values, even if they're just constants. One case I thought of where the Project would not be present would be something like:
UPDATE t SET y=x WHERE x>1 AND x<10;
That is, setting one column to another, since there's no Project needed in that case. But it's probably pretty rare.

Good catch, that does produce an expression tree without a Project. Moving to the approach Radu suggested below makes this all easier because I no longer need to reconstruct the expression once it's been matched on, so there's less work required when matching on a more diverse pattern.

pkg/sql/opt/exec/execbuilder/mutation.go, line 787 at r1 (raw file):
Got it, I'll try with that approach and see how it turns out. I'll ping this PR when it's updated. Thanks for the suggestions.

BTW, to make the opt plans more suggestive of what's going on, consider adding an "auto" locking mode which is used for the scans created for mutations (and the flag I mentioned would only apply when it's "auto").

I like the idea, but I'll hold off on making such a change for now.

nvanbenschoten

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @RaduBerinde)

pkg/sql/opt/exec/execbuilder/mutation.go, line 787 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Got it, I'll try with that approach and see how it turns out. I'll ping this PR when it's updated. Thanks for the suggestions.

BTW, to make the opt plans more suggestive of what's going on, consider adding an "auto" locking mode which is used for the scans created for mutations (and the flag I mentioned would only apply when it's "auto").

I like the idea, but I'll hold off on making such a change for now.

I made this change and I like the way it came out much better. Thanks for the suggestion. PTAL.

RaduBerinde

I like it, LGTM

RaduBerinde · 2020-02-19T18:30:11Z

pkg/sql/opt/exec/execbuilder/mutation.go

+	return b.buildMutationInput(ins.Input, colList, &ins.MutationPrivate)
+}
+
+func (b *Builder) buildUpdateInput(upd *memo.UpdateExpr, colList opt.ColList) (execPlan, error) {


[nit] Instead of these variants, you could have a shouldApplyImplicitLockingToMutationInput which takes the mutation op as an argument and does a switch on it inside.

Let me know if you like that better. Added as a separate commit and will squash if you give it a 👍

Friendly ping on this. Happy to go with whatever you prefer.

RaduBerinde

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @nvanbenschoten)

pkg/sql/opt/exec/execbuilder/mutation.go, line 61 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Friendly ping on this. Happy to go with whatever you prefer.

Sorry for dropping the ball. I like the new one better, thanks!

andy-kimball

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball and @RaduBerinde)

pkg/sql/opt/exec/execbuilder/mutation.go, line 61 at r2 (raw file):

Previously, RaduBerinde wrote…

Sorry for dropping the ball. I like the new one better, thanks!

Yeah, this is good.

andy-kimball

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @nvanbenschoten, and @RaduBerinde)

pkg/sql/opt/exec/execbuilder/mutation.go, line 32 at r3 (raw file):

	mutExpr, inputExpr memo.RelExpr, colList opt.ColList, p *memo.MutationPrivate,
) (execPlan, error) {
	if b.shouldApplyImplicitLockingToMutationInput(mutExpr) {

One small nit: might want to comment that re-entrance is not possible because mutations are never nested. Any time I use a pattern like this, I've learned to always think about what happens if we recursively call the method, because that case is a frequent source of bugs.

…ments This commit adds support for implicitly applying FOR UPDATE row-level locking modes to the initial row scan of UPDATE statements. Conceptually, if we picture an UPDATE statement as the composition of a SELECT statement and an INSERT statement (with loosened semantics around existing rows) then this change performs the following transformation: ``` UPDATE t = SELECT FROM t + INSERT INTO t => UPDATE t = SELECT FROM t FOR UPDATE + INSERT INTO t ``` The transformation is conditional on the UPDATE expression tree matching a pattern. Specifically, the FOR UPDATE locking mode is only used during the initial row scan when all row filters have been pushed into the ScanExpr. If the statement includes any filters that cannot be pushed into the scan then no row-level locking mode is applied. The rationale here is that FOR UPDATE locking is not necessary for correctness due to serializable isolation, so it is strictly a performance optimization for contended writes. Therefore, it is not worth risking the transformation being a pessimization, so it is only applied when doing so does not risk creating artificial contention. The change introduces a new `enable_implicit_select_for_update` session variable that controls whether this transformation is applied to mutation statements. It also introduces a `sql.defaults.implicit_select_for_update.enabled` cluster setting to serve as the default value for the session variable. The locking mode is still ignored by the key-value layer, but that will change in the next few days. Once that happens, we'll be able to gather performance numbers (past what's already been posted in cockroachdb#43775) about the performance impact this change has on uncontended and contended workloads. Release note (sql change): UPDATE statements now acquire locks using the FOR UPDATE locking mode during their initial row scan, which improves performance for contended workloads. This behavior is configurable using the `enable_implicit_select_for_update` session variable and the `sql.defaults.implicit_select_for_update.enabled` cluster setting.

nvanbenschoten

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @RaduBerinde)

pkg/sql/opt/exec/execbuilder/mutation.go, line 32 at r3 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

One small nit: might want to comment that re-entrance is not possible because mutations are never nested. Any time I use a pattern like this, I've learned to always think about what happens if we recursively call the method, because that case is a frequent source of bugs.

Done.

craig · 2020-02-26T20:58:38Z

Build succeeded

GitHub CI (Cockroach)

This change marks all columns in the YCSB "usertable" as NOT NULL. Doing so allows the load generator to take advantage of cockroachdb#44239, which avoids the KV lookup on the primary column family entirely when querying one of the other column families. The query plans before and after demonstrate this: ``` --- Before root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key'; tree | field | description ------------+-------------+------------------------------------------ | distributed | false | vectorized | false render | | └── scan | | | table | usertable@primary | spans | /"key"/0-/"key"/1 /"key"/6/1-/"key"/6/2 | parallel | --- After root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key'; tree | field | description ------------+-------------+------------------------ | distributed | false | vectorized | false render | | └── scan | | | table | usertable@primary | spans | /"key"/6/1-/"key"/6/2 ``` This becomes very important when running YCSB with a column family per field and with implicit SELECT FOR UPDATE (see cockroachdb#45159). Now that (as of cockroachdb#45701) UPDATE statements acquire upgrade locks during their initial row fetch, we don't want them acquiring upgrade locks on the primary column family of the row they are intending to update a single column in. This re-introduces the contention between writes to different columns in the same row that column families helped avoid (see cockroachdb#32704). By marking each column as NOT NULL, we can continue to avoid this contention.

nvanbenschoten requested a review from RaduBerinde February 18, 2020 06:55

nvanbenschoten requested a review from a team as a code owner February 18, 2020 06:55

nvanbenschoten force-pushed the nvanbenschoten/sfuExplain2 branch from 31a1d1b to fcaa153 Compare February 18, 2020 07:17

andy-kimball reviewed Feb 18, 2020

View reviewed changes

RaduBerinde approved these changes Feb 18, 2020

View reviewed changes

andy-kimball reviewed Feb 18, 2020

View reviewed changes

RaduBerinde reviewed Feb 19, 2020

View reviewed changes

nvanbenschoten commented Feb 19, 2020

View reviewed changes

nvanbenschoten force-pushed the nvanbenschoten/sfuExplain2 branch from fcaa153 to b4e6d85 Compare February 19, 2020 17:34

nvanbenschoten commented Feb 19, 2020

View reviewed changes

RaduBerinde approved these changes Feb 19, 2020

View reviewed changes

rmloveland mentioned this pull request Feb 19, 2020

Enable “SELECT FOR UPDATE” Statement cockroachdb/docs#6000

Closed

RaduBerinde requested changes Feb 21, 2020

View reviewed changes

andy-kimball approved these changes Feb 21, 2020

View reviewed changes

andy-kimball reviewed Feb 21, 2020

View reviewed changes

nvanbenschoten force-pushed the nvanbenschoten/sfuExplain2 branch from 2ee8143 to 435fa43 Compare February 26, 2020 19:28

nvanbenschoten commented Feb 26, 2020

View reviewed changes

craig bot merged commit 13c971c into cockroachdb:master Feb 26, 2020

nvanbenschoten deleted the nvanbenschoten/sfuExplain2 branch February 26, 2020 22:31

nvanbenschoten mentioned this pull request Mar 5, 2020

workload/ycsb: take advantage of SELECT FOR UPDATE #45755

Merged

jseldess mentioned this pull request Mar 19, 2020

sql/opt/exec: add implicit SELECT FOR UPDATE support for UPDATE statements cockroachdb/docs#6903

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql/opt/exec: add implicit SELECT FOR UPDATE support for UPDATE statements #45159

sql/opt/exec: add implicit SELECT FOR UPDATE support for UPDATE statements #45159

nvanbenschoten commented Feb 18, 2020

cockroach-teamcity commented Feb 18, 2020

andy-kimball left a comment

RaduBerinde left a comment

andy-kimball left a comment

RaduBerinde left a comment

nvanbenschoten left a comment

nvanbenschoten left a comment

RaduBerinde left a comment

RaduBerinde Feb 19, 2020

nvanbenschoten Feb 19, 2020

nvanbenschoten Feb 21, 2020

RaduBerinde left a comment

andy-kimball left a comment

andy-kimball left a comment

nvanbenschoten left a comment

craig bot commented Feb 26, 2020

sql/opt/exec: add implicit SELECT FOR UPDATE support for UPDATE statements #45159

sql/opt/exec: add implicit SELECT FOR UPDATE support for UPDATE statements #45159

Conversation

nvanbenschoten commented Feb 18, 2020

cockroach-teamcity commented Feb 18, 2020

andy-kimball left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

andy-kimball left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

RaduBerinde Feb 19, 2020

Choose a reason for hiding this comment

nvanbenschoten Feb 19, 2020

Choose a reason for hiding this comment

nvanbenschoten Feb 21, 2020

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

andy-kimball left a comment

Choose a reason for hiding this comment

andy-kimball left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

craig bot commented Feb 26, 2020

Build succeeded