kv: support unreplicated locks, hook up SELECT FOR UPDATE end-to-end #45701

nvanbenschoten · 2020-03-04T13:25:12Z

Closes #40205.
Closes #43775.
Informs #41720.

This change teaches the KV client and the KV API about unreplicated locks. It then adds a KeyLocking mode to ScanRequest and ReverseScanRequest, which allows their users to select the locking strength that they would like the scan to use. This locking strength defaults to None, which corresponds to the current behavior. However, some users will want to acquire locks on each row scanned, which is now possible by setting the locking strength to a stronger level. For now, only the Exclusive strength is supported.

The change then revisits SQL's row-level locking support, which is supported all the way down to the row fetcher for implicit (e.g. UPDATE) and explicit (e.g. SELECT ... FOR UPDATE) upgrade locking. The change uses the new key-locking functionality in the KV API to hook up row-level locking, completing the integration of SELECT FOR UPDATE with the KV layer and, in particular, the new lock-table structure.

#43775 described the three main benefits of this change:

higher throughput under contention
lower latency and improved fairness under contention
a reduction in transaction retries under contention

I've revisited those results a few times in the last two months and seen that the results continue to hold, and in some cases they have improved. I intend to update this PR with a more complete analysis of its impact on those three areas.

Release note (sql change): SELECT FOR UPDATE now hooks into a new leaseholder-only locking mechanism. This allows the feature to be used to improve performance of transactions that read, modify, and write contended to rows. Similarly, UPDATE statements now use this new mechanism by default, meaning that their performance under contention is improved. This is only enabled for UPDATE statements that can push their filter all the way into their key-value scan. To determine whether an UPDATE statement is implicitly using SELECT FOR UPDATE locking, look for a "locking strength" field in the EXPLAIN output for the statement.

cockroach-teamcity · 2020-03-04T13:25:25Z

This change is

andreimatei

LGTM

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)

pkg/internal/client/batch.go, line 557 at r6 (raw file):

// each of the returned keys.
//
// A new result will be appended to the batch which will contain  "rows" (each

double space

pkg/internal/client/db.go, line 419 at r6 (raw file):

	maxRows int64,
	isReverse bool,
	forUpdate bool,

at least define forUpdate = true, notForUpdate = false constants.

pkg/kv/txn_interceptor_pipeliner.go, line 184 at r6 (raw file):

	// The transaction's lock footprint contains spans where locks have been
	// acquired at some point by the transaction. The span set contains spans
	// encompassing the lock spans from all writes that have already been proven

please add the word "intent" in here somewhere and make it clear that the footprint contains both replicated and unreplicated locks.

pkg/storage/engine/mvcc.go, line 2734 at r5 (raw file):

				intent.Txn.WriteTimestamp, false, unsafeValue, nil, gbuf)
			if err != nil {
				log.Warningf(ctx, "unable to find value for %s @ %s: %v ",

I'm glad to see this warning gone.

nvanbenschoten · 2020-03-04T21:30:30Z

@rytaft do you mind signing off on the sql: move calls to planner.maybeSetSystemConfig into execFactory commit? This relates to the discussion we were having on slack.

rytaft

The sql commit LGTM

tbg

Switching to pure Github review for the rest, Reviewable keeps showing me things I've already seen, probably the rebase.

Reviewed 80 of 80 files at r7, 5 of 5 files at r8, 15 of 15 files at r9, 39 of 39 files at r10, 14 of 14 files at r11, 7 of 7 files at r12.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)

pkg/internal/client/db.go, line 419 at r6 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

at least define forUpdate = true, notForUpdate = false constants.

Also curious why this is called "forUpdate", which is a SQL concept, while this is the KV API. lock lock.Mode comes to mind.

No need to do anything here now, just interested in the conversation.

pkg/kv/txn_interceptor_pipeliner.go, line 184 at r6 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please add the word "intent" in here somewhere and make it clear that the footprint contains both replicated and unreplicated locks.

But it shouldn't? ISTM that it should only track intents and that it doesn't even have to know that an intent comes with a lock built-in. I might be missing something. But I definitely don't see a reason it needs to keep track of unreplicated locks.

tbg · 2020-03-05T10:35:16Z

pkg/kv/kvserver/replica_evaluate_test.go

+			},
+		},
+		{
+			// Ditto in reverse.


Test the interaction of scans that a) have a limit (=acquire only locks for keys returned) and b) return tombstones (do we put locks on the tombstone keys and is this intentional? Add commentary on the flag that turns tombstone retrieval on).

tbg · 2020-03-05T10:36:34Z

pkg/kv/kvserver/replica_evaluate_test.go

@@ -256,7 +352,8 @@ func TestEvaluateBatch(t *testing.T) {
 				idKey: storagebase.CmdIDKey("testing"),
 				eng:   eng,
 			}
-			d.ba.Header.Timestamp = hlc.Timestamp{WallTime: 1}
+			d.AbortSpan = abortspan.New(1)


What hit this?

tbg · 2020-03-05T10:41:48Z

pkg/kv/txn_interceptor_pipeliner.go

-	// and all writes, in-flight or not, at the end of prior epochs. All of the
-	// transaction's in-flight writes are morally in this set as well, but they
-	// are not stored here to avoid duplication.
+	// The transaction's lock footprint contains spans where locks have been


Ah, I missed earlier that the pipeliner keeps track of the locks solely for the purpose of handing them to EndTransaction in the end. That makes sense, though it would be nice to clarify where the locks matter (include in txn record) and where the writes do (pipeline).

tbg · 2020-03-05T10:47:15Z

pkg/kv/txn_interceptor_pipeliner.go

-// After updating the write sets, the write footprint is condensed to ensure
-// that it remains under its memory limit.
+// After updating the write sets, the lock footprint is condensed to ensure that
+// it remains under its memory limit.


Ah, now I understand. The footprint only has spans, it doesn't care whether a lock is replicated or not. It's just for cleanup.

tbg · 2020-03-05T10:51:23Z

pkg/sql/logictest/testdata/logic_test/fk

-# scanning the primary column family.
+# scanning the primary column family. A critical reason why this works is
+# because column b is NOT NULL, so the UPDATE statement does not acquire
+# FOR UPDATE locks on the primary column family.


If we acquired ranged locks on reads, would we just get a lock on b even if it's NULL, or would we still want to lock the PK?

tbg · 2020-03-05T11:09:26Z

In the release note, could you be a little more precise on when we implicitly apply FOR UPDATE (only when all filters are pushed down, also for UPSERT, etc)? Also, that we only support exclusive locks (auto-upgrade to that). Also, a way to use EXPLAIN to verify whether locks are implicitly acquired.

Very happy to see this all go in.

tbg

Reviewed 35 of 35 files at r13.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)

This change marks all columns in the YCSB "usertable" as NOT NULL. Doing so allows the load generator to take advantage of cockroachdb#44239, which avoids the KV lookup on the primary column family entirely when querying one of the other column families. The query plans before and after demonstrate this: ``` --- Before root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key'; tree | field | description ------------+-------------+------------------------------------------ | distributed | false | vectorized | false render | | └── scan | | | table | usertable@primary | spans | /"key"/0-/"key"/1 /"key"/6/1-/"key"/6/2 | parallel | --- After root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key'; tree | field | description ------------+-------------+------------------------ | distributed | false | vectorized | false render | | └── scan | | | table | usertable@primary | spans | /"key"/6/1-/"key"/6/2 ``` This becomes very important when running YCSB with a column family per field and with implicit SELECT FOR UPDATE (see cockroachdb#45159). Now that (as of cockroachdb#45701) UPDATE statements acquire upgrade locks during their initial row fetch, we don't want them acquiring upgrade locks on the primary column family of the row they are intending to update a single column in. This re-introduces the contention between writes to different columns in the same row that column families helped avoid (see cockroachdb#32704). By marking each column as NOT NULL, we can continue to avoid this contention.

nvanbenschoten

TFTRs!

In the release note, could you be a little more precise on when we implicitly apply FOR UPDATE (only when all filters are pushed down, also for UPSERT, etc)? Also, that we only support exclusive locks (auto-upgrade to that). Also, a way to use EXPLAIN to verify whether locks are implicitly acquired.

Done, except for the point about exclusive locks, that seems like an implementation detail that will confuse users.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)

pkg/internal/client/batch.go, line 557 at r6 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

double space

Done.

pkg/internal/client/db.go, line 419 at r6 (raw file):

at least define forUpdate = true, notForUpdate = false constants.

Eh, we don't do this for isReverse and the callers are directly below.

Also curious why this is called "forUpdate", which is a SQL concept, while this is the KV API. lock lock.Mode comes to mind.

Yeah, I see what you're getting at here. We don't support any other locking strength right now though, so I'm not too inspired to change it. "forUpdate" doesn't seem like it needs to be a strict SQL concept if there's only a single locking strength that it can imply.

pkg/kv/txn_interceptor_pipeliner.go, line 184 at r6 (raw file):

Previously, tbg (Tobias Grieger) wrote…

But it shouldn't? ISTM that it should only track intents and that it doesn't even have to know that an intent comes with a lock built-in. I might be missing something. But I definitely don't see a reason it needs to keep track of unreplicated locks.

Done.

pkg/kv/txn_interceptor_pipeliner.go, line 182 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Ah, I missed earlier that the pipeliner keeps track of the locks solely for the purpose of handing them to EndTransaction in the end. That makes sense, though it would be nice to clarify where the locks matter (include in txn record) and where the writes do (pipeline).

Done.

pkg/kv/txn_interceptor_pipeliner.go, line 443 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Ah, now I understand. The footprint only has spans, it doesn't care whether a lock is replicated or not. It's just for cleanup.

Yes, I tried to clarify this in a few places. Sorry for the confusion.

pkg/kv/kvserver/replica_evaluate_test.go, line 279 at r13 (raw file):

a) have a limit (=acquire only locks for keys returned) and

Good idea. Done for both MaxSpanRequestKeys and TargetBytes.

b) return tombstones (do we put locks on the tombstone keys and is this intentional? Add commentary on the flag that turns tombstone retrieval on).

The flags aren't really in the same place. Tombstones is on storage.MVCCScanOptions and KeyLocking is on ScanRequest. They don't really interact because cmd_scan.go never sets Tombstones=true.

The behavior we get from this is intentional though - we don't lock tombstone keys, just like Postgres doesn't lock deleted rows.

pkg/kv/kvserver/replica_evaluate_test.go, line 355 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

What hit this?

When I started running transactional batches I hit this.

pkg/sql/logictest/testdata/logic_test/fk, line 2662 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

If we acquired ranged locks on reads, would we just get a lock on b even if it's NULL, or would we still want to lock the PK?

I don't think I'm exactly understanding your question and how it relates to ranged locks, but I think I may be confusing you about this why the NOT NULL part is important. Whether we acquire locks over scan ranges or just points within that range doesn't really matter. The important part here is that NOT NULL allows us to not scan the primary column family at all. If b wasn't marked as NOT NULL, we would need to scan the primary column family during the initial row-fetch of the UPDATE because b's family alone wouldn't tell us whether the row exists or not. This would have acquired an unreplicated lock on the primary column family, which would have blocked the INSERT. The optimizer knows that it only needs to update some columns and not others, but it would have been close to impossible to plumb this information down to the row fetcher and only set the KeyLocking mode on some of the scans. That also causes issues with span coalescing, because now we need to worry about two adjacent column families which want different locking strengths.

So the key here is that NOT NULL allows us to not scan the primary index at all when performing the UPDATE statements implicit FOR UPDATE scan. This was enabled by #44239 and is already being exploited in #45755 with YCSB.

I added more to the comment to make this clearer.

tbg

Don't forget to update the PR message with the updated release note before merge (not sure it matters, but I think they will both get picked up and so they better be the same).

Reviewed 59 of 59 files at r14, 14 of 14 files at r15, 7 of 7 files at r16, 34 of 34 files at r17.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)

This commit renames the IntentSpans field in Transaction, TransactionRecord and EndTxnRequest from "IntentSpans" to "LockSpans". The field is intended to carry the spans of all locks, not just those of write intents.

Purely a rename along with a bit of comment cleanup.

This ensures that the calls to set the system config trigger are made before any part of the plan begins executing. The current approach would have been an issue after the next commit because TableReader plan nodes can now acquire locks, which means that they can cause a Txn to set its key before the mutation plan has its `startExec` method called. This change prevented the call to `maybeSetSystemConfig` from working correctly on statements like: `UPDATE system.zones SET config = $2 WHERE id = $1` With this change, things work correctly again.

Closes cockroachdb#40205. Informs cockroachdb#41720. This change teaches the KV client and the KV API about unreplicated locks. It then adds a KeyLocking mode to ScanRequest and ReverseScanRequest, which allows their users to select the locking strength that they would like the scan to use. This locking strength defaults to None, which corresponds to the current behavior. However, some users will want to acquire locks on each row scanned, which is now possible by setting the locking strength to a stronger level. For now, only the Exclusive strength is supported. The change then revisits SQL's row-level locking support, which is supported all the way down to the row fetcher for implicit (e.g. UPDATE) and explicit (e.g. SELECT ... FOR UPDATE) upgrade locking. The change uses the new key-locking functionality in the KV API to hook up row-level locking, completing the integration of SELECT FOR UPDATE with the KV layer and, in particular, the new lock-table structure. cockroachdb#43775 described the three main benefits of this change: - higher throughput under contention - lower latency and improved fairness under contention - a reduction in transaction retries under contention I've revisited those results a few times in the last two months and seen that the results continue to hold, and in some cases they have improved. I intend to update this PR with a more complete analysis of its impact on those three areas. Release note (sql change): SELECT FOR UPDATE now hooks into a new leaseholder-only locking mechanism. This allows the feature to be used to improve performance of transactions that read, modify, and write contended to rows. Similarly, UPDATE statements now use this new mechanism by default, meaning that their performance under contention is improved. This is only enabled for UPDATE statements that can push their filter all the way into their key-value scan. To determine whether an UPDATE statement is implicitly using SELECT FOR UPDATE locking, look for a "locking strength" field in the EXPLAIN output for the statement.

nvanbenschoten · 2020-03-06T18:55:19Z

Don't forget to update the PR message with the updated release note before merge (not sure it matters, but I think they will both get picked up and so they better be the same).

Done.

This change marks all columns in the YCSB "usertable" as NOT NULL. Doing so allows the load generator to take advantage of cockroachdb#44239, which avoids the KV lookup on the primary column family entirely when querying one of the other column families. The query plans before and after demonstrate this: ``` --- Before root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key'; tree | field | description ------------+-------------+------------------------------------------ | distributed | false | vectorized | false render | | └── scan | | | table | usertable@primary | spans | /"key"/0-/"key"/1 /"key"/6/1-/"key"/6/2 | parallel | --- After root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key'; tree | field | description ------------+-------------+------------------------ | distributed | false | vectorized | false render | | └── scan | | | table | usertable@primary | spans | /"key"/6/1-/"key"/6/2 ``` This becomes very important when running YCSB with a column family per field and with implicit SELECT FOR UPDATE (see cockroachdb#45159). Now that (as of cockroachdb#45701) UPDATE statements acquire upgrade locks during their initial row fetch, we don't want them acquiring upgrade locks on the primary column family of the row they are intending to update a single column in. This re-introduces the contention between writes to different columns in the same row that column families helped avoid (see cockroachdb#32704). By marking each column as NOT NULL, we can continue to avoid this contention.

nvanbenschoten requested review from andreimatei and tbg March 4, 2020 13:25

nvanbenschoten force-pushed the nvanbenschoten/sfuAPI2 branch from dd2c7dc to dab151d Compare March 4, 2020 14:46

andreimatei approved these changes Mar 4, 2020

View reviewed changes

nvanbenschoten force-pushed the nvanbenschoten/sfuAPI2 branch from dab151d to 515d1a9 Compare March 4, 2020 20:59

rytaft approved these changes Mar 4, 2020

View reviewed changes

nvanbenschoten force-pushed the nvanbenschoten/sfuAPI2 branch from 515d1a9 to 99cfc3e Compare March 4, 2020 23:36

tbg requested a review from andreimatei March 5, 2020 10:30

tbg reviewed Mar 5, 2020

View reviewed changes

tbg self-requested a review March 5, 2020 10:30

tbg approved these changes Mar 5, 2020

View reviewed changes

tbg reviewed Mar 5, 2020

View reviewed changes

nvanbenschoten mentioned this pull request Mar 5, 2020

workload/ycsb: take advantage of SELECT FOR UPDATE #45755

Merged

nvanbenschoten force-pushed the nvanbenschoten/sfuAPI2 branch from 99cfc3e to ba77311 Compare March 6, 2020 02:35

nvanbenschoten commented Mar 6, 2020

View reviewed changes

tbg approved these changes Mar 6, 2020

View reviewed changes

nvanbenschoten added 4 commits March 6, 2020 08:49

roachpb: rename IntentSpans to LockSpans

2674067

This commit renames the IntentSpans field in Transaction, TransactionRecord and EndTxnRequest from "IntentSpans" to "LockSpans". The field is intended to carry the spans of all locks, not just those of write intents.

result: rename {Written/Resolved}Intents to {Written/Resolved}Locks

65f32e7

Purely a rename along with a bit of comment cleanup.

nvanbenschoten force-pushed the nvanbenschoten/sfuAPI2 branch from ba77311 to a98be1f Compare March 6, 2020 13:55

nvanbenschoten merged commit 43f498e into cockroachdb:master Mar 6, 2020

nvanbenschoten deleted the nvanbenschoten/sfuAPI2 branch March 16, 2020 16:55

jseldess mentioned this pull request Mar 19, 2020

kv: support unreplicated locks, hook up SELECT FOR UPDATE end-to-end cockroachdb/docs#6920

Closed

timgraham mentioned this pull request Jul 27, 2020

enable QuerySet.select_for_update() and select_for_update(of=...) cockroachdb/django-cockroachdb#153

Closed

nvanbenschoten mentioned this pull request Aug 5, 2020

sql/kv: support FOR NO KEY UPDATE and FOR KEY SHARE #52420

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: support unreplicated locks, hook up SELECT FOR UPDATE end-to-end #45701

kv: support unreplicated locks, hook up SELECT FOR UPDATE end-to-end #45701

nvanbenschoten commented Mar 4, 2020 •

edited

Loading

cockroach-teamcity commented Mar 4, 2020

andreimatei left a comment

nvanbenschoten commented Mar 4, 2020

rytaft left a comment •

edited

Loading

tbg left a comment

tbg Mar 5, 2020

tbg Mar 5, 2020

tbg Mar 5, 2020

tbg Mar 5, 2020

tbg Mar 5, 2020

tbg commented Mar 5, 2020

tbg left a comment

nvanbenschoten left a comment

tbg left a comment

nvanbenschoten commented Mar 6, 2020

kv: support unreplicated locks, hook up SELECT FOR UPDATE end-to-end #45701

kv: support unreplicated locks, hook up SELECT FOR UPDATE end-to-end #45701

Conversation

nvanbenschoten commented Mar 4, 2020 • edited Loading

cockroach-teamcity commented Mar 4, 2020

andreimatei left a comment

Choose a reason for hiding this comment

nvanbenschoten commented Mar 4, 2020

rytaft left a comment • edited Loading

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

tbg Mar 5, 2020

Choose a reason for hiding this comment

tbg Mar 5, 2020

Choose a reason for hiding this comment

tbg Mar 5, 2020

Choose a reason for hiding this comment

tbg Mar 5, 2020

Choose a reason for hiding this comment

tbg Mar 5, 2020

Choose a reason for hiding this comment

tbg commented Mar 5, 2020

tbg left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

nvanbenschoten commented Mar 6, 2020

nvanbenschoten commented Mar 4, 2020 •

edited

Loading

rytaft left a comment •

edited

Loading