-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: support unreplicated locks, hook up SELECT FOR UPDATE end-to-end #45701
kv: support unreplicated locks, hook up SELECT FOR UPDATE end-to-end #45701
Conversation
dd2c7dc
to
dab151d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)
pkg/internal/client/batch.go, line 557 at r6 (raw file):
// each of the returned keys. // // A new result will be appended to the batch which will contain "rows" (each
double space
pkg/internal/client/db.go, line 419 at r6 (raw file):
maxRows int64, isReverse bool, forUpdate bool,
at least define forUpdate = true, notForUpdate = false
constants.
pkg/kv/txn_interceptor_pipeliner.go, line 184 at r6 (raw file):
// The transaction's lock footprint contains spans where locks have been // acquired at some point by the transaction. The span set contains spans // encompassing the lock spans from all writes that have already been proven
please add the word "intent" in here somewhere and make it clear that the footprint contains both replicated and unreplicated locks.
pkg/storage/engine/mvcc.go, line 2734 at r5 (raw file):
intent.Txn.WriteTimestamp, false, unsafeValue, nil, gbuf) if err != nil { log.Warningf(ctx, "unable to find value for %s @ %s: %v ",
I'm glad to see this warning gone.
dab151d
to
515d1a9
Compare
@rytaft do you mind signing off on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sql commit LGTM
515d1a9
to
99cfc3e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switching to pure Github review for the rest, Reviewable keeps showing me things I've already seen, probably the rebase.
Reviewed 80 of 80 files at r7, 5 of 5 files at r8, 15 of 15 files at r9, 39 of 39 files at r10, 14 of 14 files at r11, 7 of 7 files at r12.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)
pkg/internal/client/db.go, line 419 at r6 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
at least define
forUpdate = true, notForUpdate = false
constants.
Also curious why this is called "forUpdate", which is a SQL concept, while this is the KV API. lock lock.Mode
comes to mind.
No need to do anything here now, just interested in the conversation.
pkg/kv/txn_interceptor_pipeliner.go, line 184 at r6 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
please add the word "intent" in here somewhere and make it clear that the footprint contains both replicated and unreplicated locks.
But it shouldn't? ISTM that it should only track intents and that it doesn't even have to know that an intent comes with a lock built-in. I might be missing something. But I definitely don't see a reason it needs to keep track of unreplicated locks.
}, | ||
}, | ||
{ | ||
// Ditto in reverse. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test the interaction of scans that a) have a limit (=acquire only locks for keys returned) and b) return tombstones (do we put locks on the tombstone keys and is this intentional? Add commentary on the flag that turns tombstone retrieval on).
@@ -256,7 +352,8 @@ func TestEvaluateBatch(t *testing.T) { | |||
idKey: storagebase.CmdIDKey("testing"), | |||
eng: eng, | |||
} | |||
d.ba.Header.Timestamp = hlc.Timestamp{WallTime: 1} | |||
d.AbortSpan = abortspan.New(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What hit this?
pkg/kv/txn_interceptor_pipeliner.go
Outdated
// and all writes, in-flight or not, at the end of prior epochs. All of the | ||
// transaction's in-flight writes are morally in this set as well, but they | ||
// are not stored here to avoid duplication. | ||
// The transaction's lock footprint contains spans where locks have been |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed earlier that the pipeliner keeps track of the locks solely for the purpose of handing them to EndTransaction in the end. That makes sense, though it would be nice to clarify where the locks matter (include in txn record) and where the writes do (pipeline).
// After updating the write sets, the write footprint is condensed to ensure | ||
// that it remains under its memory limit. | ||
// After updating the write sets, the lock footprint is condensed to ensure that | ||
// it remains under its memory limit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, now I understand. The footprint only has spans, it doesn't care whether a lock is replicated or not. It's just for cleanup.
# scanning the primary column family. | ||
# scanning the primary column family. A critical reason why this works is | ||
# because column b is NOT NULL, so the UPDATE statement does not acquire | ||
# FOR UPDATE locks on the primary column family. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we acquired ranged locks on reads, would we just get a lock on b
even if it's NULL
, or would we still want to lock the PK?
In the release note, could you be a little more precise on when we implicitly apply FOR UPDATE (only when all filters are pushed down, also for UPSERT, etc)? Also, that we only support exclusive locks (auto-upgrade to that). Also, a way to use EXPLAIN to verify whether locks are implicitly acquired. Very happy to see this all go in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 35 of 35 files at r13.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)
This change marks all columns in the YCSB "usertable" as NOT NULL. Doing so allows the load generator to take advantage of cockroachdb#44239, which avoids the KV lookup on the primary column family entirely when querying one of the other column families. The query plans before and after demonstrate this: ``` --- Before root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key'; tree | field | description ------------+-------------+------------------------------------------ | distributed | false | vectorized | false render | | └── scan | | | table | usertable@primary | spans | /"key"/0-/"key"/1 /"key"/6/1-/"key"/6/2 | parallel | --- After root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key'; tree | field | description ------------+-------------+------------------------ | distributed | false | vectorized | false render | | └── scan | | | table | usertable@primary | spans | /"key"/6/1-/"key"/6/2 ``` This becomes very important when running YCSB with a column family per field and with implicit SELECT FOR UPDATE (see cockroachdb#45159). Now that (as of cockroachdb#45701) UPDATE statements acquire upgrade locks during their initial row fetch, we don't want them acquiring upgrade locks on the primary column family of the row they are intending to update a single column in. This re-introduces the contention between writes to different columns in the same row that column families helped avoid (see cockroachdb#32704). By marking each column as NOT NULL, we can continue to avoid this contention.
99cfc3e
to
ba77311
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
In the release note, could you be a little more precise on when we implicitly apply FOR UPDATE (only when all filters are pushed down, also for UPSERT, etc)? Also, that we only support exclusive locks (auto-upgrade to that). Also, a way to use EXPLAIN to verify whether locks are implicitly acquired.
Done, except for the point about exclusive locks, that seems like an implementation detail that will confuse users.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)
pkg/internal/client/batch.go, line 557 at r6 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
double space
Done.
pkg/internal/client/db.go, line 419 at r6 (raw file):
at least define forUpdate = true, notForUpdate = false constants.
Eh, we don't do this for isReverse
and the callers are directly below.
Also curious why this is called "forUpdate", which is a SQL concept, while this is the KV API. lock lock.Mode comes to mind.
Yeah, I see what you're getting at here. We don't support any other locking strength right now though, so I'm not too inspired to change it. "forUpdate" doesn't seem like it needs to be a strict SQL concept if there's only a single locking strength that it can imply.
pkg/kv/txn_interceptor_pipeliner.go, line 184 at r6 (raw file):
Previously, tbg (Tobias Grieger) wrote…
But it shouldn't? ISTM that it should only track intents and that it doesn't even have to know that an intent comes with a lock built-in. I might be missing something. But I definitely don't see a reason it needs to keep track of unreplicated locks.
Done.
pkg/kv/txn_interceptor_pipeliner.go, line 182 at r13 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Ah, I missed earlier that the pipeliner keeps track of the locks solely for the purpose of handing them to EndTransaction in the end. That makes sense, though it would be nice to clarify where the locks matter (include in txn record) and where the writes do (pipeline).
Done.
pkg/kv/txn_interceptor_pipeliner.go, line 443 at r13 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Ah, now I understand. The footprint only has spans, it doesn't care whether a lock is replicated or not. It's just for cleanup.
Yes, I tried to clarify this in a few places. Sorry for the confusion.
pkg/kv/kvserver/replica_evaluate_test.go, line 279 at r13 (raw file):
a) have a limit (=acquire only locks for keys returned) and
Good idea. Done for both MaxSpanRequestKeys
and TargetBytes
.
b) return tombstones (do we put locks on the tombstone keys and is this intentional? Add commentary on the flag that turns tombstone retrieval on).
The flags aren't really in the same place. Tombstones
is on storage.MVCCScanOptions
and KeyLocking
is on ScanRequest
. They don't really interact because cmd_scan.go
never sets Tombstones=true
.
The behavior we get from this is intentional though - we don't lock tombstone keys, just like Postgres doesn't lock deleted rows.
pkg/kv/kvserver/replica_evaluate_test.go, line 355 at r13 (raw file):
Previously, tbg (Tobias Grieger) wrote…
What hit this?
When I started running transactional batches I hit this.
pkg/sql/logictest/testdata/logic_test/fk, line 2662 at r13 (raw file):
Previously, tbg (Tobias Grieger) wrote…
If we acquired ranged locks on reads, would we just get a lock on
b
even if it'sNULL
, or would we still want to lock the PK?
I don't think I'm exactly understanding your question and how it relates to ranged locks, but I think I may be confusing you about this why the NOT NULL
part is important. Whether we acquire locks over scan ranges or just points within that range doesn't really matter. The important part here is that NOT NULL
allows us to not scan the primary column family at all. If b
wasn't marked as NOT NULL
, we would need to scan the primary column family during the initial row-fetch of the UPDATE
because b
's family alone wouldn't tell us whether the row exists or not. This would have acquired an unreplicated lock on the primary column family, which would have blocked the INSERT
. The optimizer knows that it only needs to update some columns and not others, but it would have been close to impossible to plumb this information down to the row fetcher and only set the KeyLocking
mode on some of the scans. That also causes issues with span coalescing, because now we need to worry about two adjacent column families which want different locking strengths.
So the key here is that NOT NULL
allows us to not scan the primary index at all when performing the UPDATE
statements implicit FOR UPDATE scan. This was enabled by #44239 and is already being exploited in #45755 with YCSB.
I added more to the comment to make this clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to update the PR message with the updated release note before merge (not sure it matters, but I think they will both get picked up and so they better be the same).
Reviewed 59 of 59 files at r14, 14 of 14 files at r15, 7 of 7 files at r16, 34 of 34 files at r17.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)
This commit renames the IntentSpans field in Transaction, TransactionRecord and EndTxnRequest from "IntentSpans" to "LockSpans". The field is intended to carry the spans of all locks, not just those of write intents.
Purely a rename along with a bit of comment cleanup.
This ensures that the calls to set the system config trigger are made before any part of the plan begins executing. The current approach would have been an issue after the next commit because TableReader plan nodes can now acquire locks, which means that they can cause a Txn to set its key before the mutation plan has its `startExec` method called. This change prevented the call to `maybeSetSystemConfig` from working correctly on statements like: `UPDATE system.zones SET config = $2 WHERE id = $1` With this change, things work correctly again.
Closes cockroachdb#40205. Informs cockroachdb#41720. This change teaches the KV client and the KV API about unreplicated locks. It then adds a KeyLocking mode to ScanRequest and ReverseScanRequest, which allows their users to select the locking strength that they would like the scan to use. This locking strength defaults to None, which corresponds to the current behavior. However, some users will want to acquire locks on each row scanned, which is now possible by setting the locking strength to a stronger level. For now, only the Exclusive strength is supported. The change then revisits SQL's row-level locking support, which is supported all the way down to the row fetcher for implicit (e.g. UPDATE) and explicit (e.g. SELECT ... FOR UPDATE) upgrade locking. The change uses the new key-locking functionality in the KV API to hook up row-level locking, completing the integration of SELECT FOR UPDATE with the KV layer and, in particular, the new lock-table structure. cockroachdb#43775 described the three main benefits of this change: - higher throughput under contention - lower latency and improved fairness under contention - a reduction in transaction retries under contention I've revisited those results a few times in the last two months and seen that the results continue to hold, and in some cases they have improved. I intend to update this PR with a more complete analysis of its impact on those three areas. Release note (sql change): SELECT FOR UPDATE now hooks into a new leaseholder-only locking mechanism. This allows the feature to be used to improve performance of transactions that read, modify, and write contended to rows. Similarly, UPDATE statements now use this new mechanism by default, meaning that their performance under contention is improved. This is only enabled for UPDATE statements that can push their filter all the way into their key-value scan. To determine whether an UPDATE statement is implicitly using SELECT FOR UPDATE locking, look for a "locking strength" field in the EXPLAIN output for the statement.
ba77311
to
a98be1f
Compare
Done. |
This change marks all columns in the YCSB "usertable" as NOT NULL. Doing so allows the load generator to take advantage of cockroachdb#44239, which avoids the KV lookup on the primary column family entirely when querying one of the other column families. The query plans before and after demonstrate this: ``` --- Before root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key'; tree | field | description ------------+-------------+------------------------------------------ | distributed | false | vectorized | false render | | └── scan | | | table | usertable@primary | spans | /"key"/0-/"key"/1 /"key"/6/1-/"key"/6/2 | parallel | --- After root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key'; tree | field | description ------------+-------------+------------------------ | distributed | false | vectorized | false render | | └── scan | | | table | usertable@primary | spans | /"key"/6/1-/"key"/6/2 ``` This becomes very important when running YCSB with a column family per field and with implicit SELECT FOR UPDATE (see cockroachdb#45159). Now that (as of cockroachdb#45701) UPDATE statements acquire upgrade locks during their initial row fetch, we don't want them acquiring upgrade locks on the primary column family of the row they are intending to update a single column in. This re-introduces the contention between writes to different columns in the same row that column families helped avoid (see cockroachdb#32704). By marking each column as NOT NULL, we can continue to avoid this contention.
This change marks all columns in the YCSB "usertable" as NOT NULL. Doing so allows the load generator to take advantage of cockroachdb#44239, which avoids the KV lookup on the primary column family entirely when querying one of the other column families. The query plans before and after demonstrate this: ``` --- Before root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key'; tree | field | description ------------+-------------+------------------------------------------ | distributed | false | vectorized | false render | | └── scan | | | table | usertable@primary | spans | /"key"/0-/"key"/1 /"key"/6/1-/"key"/6/2 | parallel | --- After root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key'; tree | field | description ------------+-------------+------------------------ | distributed | false | vectorized | false render | | └── scan | | | table | usertable@primary | spans | /"key"/6/1-/"key"/6/2 ``` This becomes very important when running YCSB with a column family per field and with implicit SELECT FOR UPDATE (see cockroachdb#45159). Now that (as of cockroachdb#45701) UPDATE statements acquire upgrade locks during their initial row fetch, we don't want them acquiring upgrade locks on the primary column family of the row they are intending to update a single column in. This re-introduces the contention between writes to different columns in the same row that column families helped avoid (see cockroachdb#32704). By marking each column as NOT NULL, we can continue to avoid this contention.
Closes #40205.
Closes #43775.
Informs #41720.
This change teaches the KV client and the KV API about unreplicated locks. It then adds a KeyLocking mode to ScanRequest and ReverseScanRequest, which allows their users to select the locking strength that they would like the scan to use. This locking strength defaults to None, which corresponds to the current behavior. However, some users will want to acquire locks on each row scanned, which is now possible by setting the locking strength to a stronger level. For now, only the Exclusive strength is supported.
The change then revisits SQL's row-level locking support, which is supported all the way down to the row fetcher for implicit (e.g. UPDATE) and explicit (e.g. SELECT ... FOR UPDATE) upgrade locking. The change uses the new key-locking functionality in the KV API to hook up row-level locking, completing the integration of SELECT FOR UPDATE with the KV layer and, in particular, the new lock-table structure.
#43775 described the three main benefits of this change:
I've revisited those results a few times in the last two months and seen that the results continue to hold, and in some cases they have improved. I intend to update this PR with a more complete analysis of its impact on those three areas.
Release note (sql change): SELECT FOR UPDATE now hooks into a new leaseholder-only locking mechanism. This allows the feature to be used to improve performance of transactions that read, modify, and write contended to rows. Similarly, UPDATE statements now use this new mechanism by default, meaning that their performance under contention is improved. This is only enabled for UPDATE statements that can push their filter all the way into their key-value scan. To determine whether an UPDATE statement is implicitly using SELECT FOR UPDATE locking, look for a "locking strength" field in the EXPLAIN output for the statement.