-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libroach,storage: extend MVCC to support ignored seqnum ranges #42152
Conversation
Here's an embryo of an approach, with several questions open:
|
6c7a691
to
a420ec4
Compare
pkg/storage/engine/mvcc.go
Outdated
|
||
// Otherwise, we place back the write at that history entry | ||
// back into the intent. | ||
meta.Deleted = len(meta.IntentHistory[i].Value) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line in particular makes me a little bit afraid.
The intent history does not distinguish between an empty value array, and a Delete intent. Does this matter?
@nvanbenschoten
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that doesn't matter. An empty value is a delete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good so far, though I'm less confident about how a rollback affects the intent history. It's possible it works out fine in practice, though.
Reviewed 12 of 12 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @petermattis)
c-deps/libroach/mvcc.h, line 235 at r1 (raw file):
// try the search again, using the current position as new // upper bound. end = intent_pos;
Given the intent history is sorted by sequence numbers anyway, it might be faster and simpler to just iterate back from intent_pos
until we hit an unignored sequence number, instead of doing a binary/range search again.
pkg/storage/engine/mvcc.go, line 2776 at r1 (raw file):
// has been rolled back. if i < 0 { err := engine.Clear(latestKey)
You would also need to clear the intent, which lives at MVCCMetadataKey(latestKey.Key)
.
pkg/storage/engine/mvcc.go, line 2783 at r1 (raw file):
// back into the intent. meta.Deleted = len(meta.IntentHistory[i].Value) == 0 meta.ValBytes = int64(len(meta.IntentHistory[i].Value))
The fact that we just switch up the current value and roll back a part of the intent history, while also leaving the history unmodified, feels a bit weird. It's probably correct, still, but I'd want to explore how this code interacts with getFromIntentHistory
more closely. Did you consider either 1) slicing away the history to only include the part up to i (then doing a putMeta to update it), or 2) appending the existing value to the intent history (even if it's an ignored seqnum), putMeta
-ing that, then sliding in the new value at latestKey
? Approach 2 is similar to how mvccPuts overwrite existing values in the same txn.
pkg/storage/engine/mvcc.go, line 2784 at r1 (raw file):
meta.Deleted = len(meta.IntentHistory[i].Value) == 0 meta.ValBytes = int64(len(meta.IntentHistory[i].Value)) meta.RawBytes = meta.IntentHistory[i].Value
meta.RawBytes
is meant for inlined values at the 0 timestamp. I don't think that's what you want to set here. It may make the read path shortcircuit early and return this even when it's not the right response.
pkg/storage/engine/mvcc.go, line 2786 at r1 (raw file):
meta.RawBytes = meta.IntentHistory[i].Value // And also overwrite whatever was there in storage. err = engine.Put(latestKey, meta.RawBytes)
Do a put at the MVCC meta timestamp before this, to update the metadata there. See how MVCCPut
updates the metadata, or the one case in mvccResolveWriteIntents
that does a putMeta
.
a420ec4
to
10e6a5e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @petermattis)
c-deps/libroach/mvcc.h, line 235 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Given the intent history is sorted by sequence numbers anyway, it might be faster and simpler to just iterate back from
intent_pos
until we hit an unignored sequence number, instead of doing a binary/range search again.
This was Nathan's idea. I'll let you two optimize this for performance once I get the correctness nailed down.
pkg/storage/engine/mvcc.go, line 2776 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
You would also need to clear the intent, which lives at
MVCCMetadataKey(latestKey.Key)
.
Done.
pkg/storage/engine/mvcc.go, line 2783 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
The fact that we just switch up the current value and roll back a part of the intent history, while also leaving the history unmodified, feels a bit weird. It's probably correct, still, but I'd want to explore how this code interacts with
getFromIntentHistory
more closely. Did you consider either 1) slicing away the history to only include the part up to i (then doing a putMeta to update it), or 2) appending the existing value to the intent history (even if it's an ignored seqnum),putMeta
-ing that, then sliding in the new value atlatestKey
? Approach 2 is similar to how mvccPuts overwrite existing values in the same txn.
Done.
pkg/storage/engine/mvcc.go, line 2784 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
meta.RawBytes
is meant for inlined values at the 0 timestamp. I don't think that's what you want to set here. It may make the read path shortcircuit early and return this even when it's not the right response.
Done.
pkg/storage/engine/mvcc.go, line 2786 at r1 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Do a put at the MVCC meta timestamp before this, to update the metadata there. See how
MVCCPut
updates the metadata, or the one case inmvccResolveWriteIntents
that does aputMeta
.
not needed, I rearranged the code to capitalize on the regular path instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @knz, @nvanbenschoten, and @petermattis)
pkg/storage/engine/rocksdb.go, line 2637 at r2 (raw file):
r.sequence = C.int32_t(txn.Sequence) r.max_timestamp = goToCTimestamp(txn.MaxTimestamp) r.ignored_seqnums = goToCIgnoredSeqNums(txn.IgnoredSeqNums)
I'll let @itsbilal continue with the majority of this review. Just wanted to chime in to say I took a look at the way you're passing the ignored seqnum ranges to C++ and everything looks copacetic. Let me know if there are other bits you want me to scrutinize.
2172e5d
to
eea6608
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me so far. I assume we'll get more confident with this as it's better tested, but I don't see any obvious issues with the way it's structured now. Good work!
Reviewed 1 of 6 files at r2, 9 of 9 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @knz)
The testing I am starting to implement is revealing that
The logic for cput makes all kinds of assumptions for optimization purposes, and these assumptions break if a seqnum has been reverted. |
Ok I succeeded in hammering CPut into behaving properly. See the This required reworking the "check previous value" path on both Put and CPut. A new thorough review is probably required. (Still remains to be done: Delete and perhaps other operators that are used transactionally in SQL). |
5cc6968
to
011e4b3
Compare
@itsbilal: this is about the point where I would enjoy you picking up the remainder of the work. This is because the PR/branch as-is is now approximately at a point where I can use it as basis for further KV/SQL changes. So I would like to do just this: detach my attention away from storage and simply assume that the branch as-is is "good enough" to start prototyping savepoints elsewhere. From here onward you could then proceed with pulling the PR forward to a point of completion that makes you happy about correctness, initial performance and pebble compatibility. (What I can do for you, before you pick it up, is producing reference unit tests for the desired Delete behavior.) |
@andreimatei maybe you can help me investigate the |
Well, superficially, the test fails because the CPut below apparently doesn't see the value set by the cockroach/pkg/kv/dist_sender_server_test.go Line 2519 in a32d225
And the So, I guess, your patch somehow causes a |
I was able to repro the TxnCoordSender error inside the mvcc package directly. Investigating. |
d'oh. Found it. I was forgetting to reset the list of ignored seqnums upon txn restarts. |
Well that was one thing, but there is something else beyond that. Still investigating... |
FWIW, the TCS test was not restarting a transaction.
I think on restart we have two options: either we keep the epoch number, in
which case we don't need to reset the ignored list, or we overwrite the
ignore list with 1.. current seq num and we get rid of the epoch field.
Right?
…On Mon, Nov 11, 2019, 8:07 AM kena ***@***.***> wrote:
Well that was one thing, but there is something else beyond that. Still
investigating...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#42152?email_source=notifications&email_token=AAC4C4IFFAS7MSM4XTV6K23QTFKJNA5CNFSM4JIWXH5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDWYTSY#issuecomment-552438219>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAC4C4IHNYF3AB7T75EFD23QTFKJNANCNFSM4JIWXH5A>
.
|
Yes it was, if my reading is correct (Also I've reproduced the problem, with a restart involved.)
we don't need but it doesn't hurt either (it won't be used if the intent is from a different epoch).
Yes, agreed. |
Further discussion with Nathan reveals the PR was incomplete as it did not properly extend |
a946ae5
to
fd34e08
Compare
a830a9d
to
0048635
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @knz, and @nvanbenschoten)
c-deps/libroach/mvcc.h, line 171 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I'd leave a TODO mentioning that you could use binary search here to improve the complexity.
Done
c-deps/libroach/include/libroach.h, line 67 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/TxnMeta_IgnoredSeqNumRange/IgnoredSeqNumRange/
Done.
pkg/roachpb/data.proto, line 484 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Keep this at the bottom.
Done
pkg/roachpb/data.proto, line 492 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: rewrap this line
Done
pkg/storage/engine/mvcc.go, line 1258 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
What's up with all of the spacing added to this file?
Where I was doing logging during development.
pkg/storage/engine/mvcc.go, line 1524 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I don't understand what "last write inside txn" means and how that relates to transaction epochs. Do you mean that the last write was written in the current epoch of txn? If so, I'd make that more specific.
Done
pkg/storage/engine/mvcc.go, line 1525 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Make a note that this is the common case.
Done
pkg/storage/engine/mvcc.go, line 2786 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"MVCC read"?
write. Fixed.
pkg/storage/engine/mvcc.go, line 2821 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/no new value any more/mvccRewriteIntentHistory already rewrote the key at the correct timestamp and adjusted the stats/
This brings up the question, did
mvccRewriteIntentHistory
adjust the stats? It doesn't look like it.
I think it does. I think the control path ensures the stats get adjusted properly in this case. Hopefully Bilal can double check.
pkg/storage/engine/mvcc.go, line 2942 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Add a bit more here. Rolls it back to what?
Clarified.
pkg/storage/engine/mvcc.go, line 2950 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: s/clear/cleared/
In fact, why don't we add a sentence about this to the function's comment.
Done
pkg/storage/engine/mvcc.go, line 2951 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
If we're going to push this in here then the function should be called
mvccMaybeRewriteIntentHistory
.
Done
pkg/storage/engine/mvcc.go, line 2982 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
What if what was in storage is at a different timestamp? I don't think this is correctly handling the case where the transaction commits at a different timestamp than it wrote its intents at.
I conceptually understand what you say but fuzzy on the details. Bilal please pick this up.
pkg/storage/engine/mvcc_history_test.go, line 421 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is it expected in this framework for commands like these to combine or for them to overwrite each other?
For this command it overwrites.
pkg/storage/engine/enginepb/mvcc3.proto, line 120 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"inclusive on both ends"
Done
pkg/storage/intentresolver/intent_resolver.go, line 372 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We're missing similar logic in
resolveLocalIntents
, which is a fast-path for resolving intents on a transaction record's local range in the same Raft entry as the one that moves the transaction record to committed.
Fixed
7d0cb02
to
6ce7b1c
Compare
I have removed the "debugging" field @nvanbenschoten can you check whether the intent resolution and batch eval tests I have added are OK. In this process I have also deprecated direct construction of (@itsbilal this is also good for rebase) |
5b635a7
to
6fe53e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you check whether the intent resolution and batch eval tests I have added are OK.
The ones you added are good, though I think a few other ones are needed. See my comments below.
Reviewed 36 of 37 files at r7.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @knz, and @nvanbenschoten)
c-deps/libroach/mvcc.h, line 171 at r6 (raw file):
Previously, knz (kena) wrote…
Done
s/nathan/nvanbenschoten/`
pkg/roachpb/data.go, line 2130 at r7 (raw file):
// MakeIntent makes an intent from the given span and txn. func MakeIntent(txn Transaction, span Span) Intent {
Let's change all of these functions to take their transaction protos by reference. None of them will cause them to escape. That said, they should all be inlinable anyway, but I think it's still worth doing.
pkg/roachpb/data.go, line 2139 at r7 (raw file):
// span and txn. This is suitable for use when constructing // WriteIntentError. func MakeErrorIntent(txn enginepb.TxnMeta, span Span) Intent {
I would rename this to MakePendingIntent
. I have no clue what an "error intent" is.
pkg/roachpb/data.proto, line 489 at r7 (raw file):
// where a range's end seqnum is equal to the next range's start // seqnum), and sorted in seqnum order. repeated storage.engine.enginepb.IgnoredSeqNumRange ignored_seqnums = 18
We want this to be stored in the transaction record as well, right? So it should be in the TransactionRecord
proto as well.
This indicates that we're missing a test where a transaction commits but doesn't resolve its intents. Another transaction then pushes it and is forced to resolve its intents for it. This intent resolution needs to know about the sequence numbers that were rolled back. I believe that most of the tests at this level are in replica_test.go
.
pkg/storage/batcheval/cmd_end_transaction_test.go, line 1059 at r7 (raw file):
} func TestPartialRollbackOnEndTransaction(t *testing.T) {
Give this test a quick description.
pkg/storage/batcheval/cmd_end_transaction_test.go, line 1084 at r7 (raw file):
// Write a first value at key. v.SetString("a") txn.Sequence = 0
nit: the first write will always have seq #1, not #0.
pkg/storage/batcheval/cmd_end_transaction_test.go, line 1110 at r7 (raw file):
desc: &desc, canCreateTxnFn: func() (bool, hlc.Timestamp, roachpb.TransactionAbortedReason) { //t.Fatal("CanCreateTxnRecord unexpectedly called")
Remove this.
pkg/storage/batcheval/cmd_resolve_intent_test.go, line 239 at r7 (raw file):
} func TestResolveIntentAfterPartialRollback(t *testing.T) {
Same comment about leaving a short description here.
pkg/storage/engine/mvcc.go, line 2982 at r6 (raw file):
Previously, knz (kena) wrote…
I conceptually understand what you say but fuzzy on the details. Bilal please pick this up.
Reminder that this still needs to be looked at.
pkg/storage/engine/mvcc.go, line 2775 at r7 (raw file):
// epoch and the state is still in progress but the intent was not pushed // to a larger timestamp, and if the rollback code did not modify or collapse // the intent.
Do we have any tests that try to resolve an intent with a PENDING status but where the current value has been collapsed? In other words, do we test the inProgress && !pushed && rolledBackVal == nil && collapsedIntent
case?
pkg/storage/engine/enginepb/mvcc_test.go, line 78 at r7 (raw file):
func TestTxnSeqIsIgnored(t *testing.T) { type s = enginepb.TxnSeq type r = enginepb.IgnoredSeqNumRange
I haven't seen that pattern better. I like it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @knz, and @nvanbenschoten)
pkg/roachpb/data.proto, line 489 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We want this to be stored in the transaction record as well, right? So it should be in the
TransactionRecord
proto as well.This indicates that we're missing a test where a transaction commits but doesn't resolve its intents. Another transaction then pushes it and is forced to resolve its intents for it. This intent resolution needs to know about the sequence numbers that were rolled back. I believe that most of the tests at this level are in
replica_test.go
.
wut watnow
What is this TransactionRecord
thing that I've never seen nor heard about before.
I find it informative that I was able to come this far and see so many tests succeed without even knowing about this. Is it possible that that thing is actually useless and could be removed?
6fe53e7
to
31f3c8d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @knz, and @nvanbenschoten)
c-deps/libroach/mvcc.h, line 171 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/nathan/nvanbenschoten/`
Done.
pkg/roachpb/data.go, line 2130 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Let's change all of these functions to take their transaction protos by reference. None of them will cause them to escape. That said, they should all be inlinable anyway, but I think it's still worth doing.
Done.
nit: You using "let's" tells me you want to participate, so I must wait until you push an amend to this commit. I doubt this is really what you want.
If that is what you want, say "please do this". It's more likely than not that I will follow your lead.
If you're in doubt, use "I would prefer" or "I would rather". I would always consider your preference carefully.
Either is clearer for me and more to the point.
pkg/roachpb/data.go, line 2139 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I would rename this to
MakePendingIntent
. I have no clue what an "error intent" is.
Done.
pkg/roachpb/data.go, line 2148 at r7 (raw file):
// SetTxn updates the transaction details in the intent. func (i *Intent) SetTxn(txn Transaction) {
@nvanbenschoten SetTxn is called in updateIntentTxnStatus
(intent_resolver.go) using a Transaction object pulled from pushedTxns map[...]roachpb.Transaction
.
Using a reference in the argument risks forcing a heap allocation, wouldn't it?
pkg/roachpb/data.proto, line 489 at r7 (raw file):
Previously, knz (kena) wrote…
wut watnow
What is this
TransactionRecord
thing that I've never seen nor heard about before.I find it informative that I was able to come this far and see so many tests succeed without even knowing about this. Is it possible that that thing is actually useless and could be removed?
Ok so I performed the change on TransactionRecord
.
I the test about its persistence as part of the testing in cmd_end_transaction_test.go
to verify the ignore list is persisted. PTAL at that.
Regarding the other thing you say: "a transaction commits but doesn't resolve its intents. Another transaction then pushes it and is forced to resolve its intents for it."
Can you point me to the code where another txn resolves intents after a push? I want to review this code.
As to how to do this testing. As per the explanation in TestPushTxnAlreadyCommittedOrAborted
I see the following 3 cases:
-
the pusher finds the pushee to be aborted. In this case we don't care about anything because the writes will be discarded anyway.
-
the pusher finds the pushee to be committed. In this case:
- either there is a persisted TransactionRecord, in which case the ignore list will be populated properly. I suppose I could add a test for this (please confirm).
- or, there is no persisted TransactionRecord, in which case we're fine: this is only possible if the pushee has performed intent resolution synchronously itself (accounting for the ignore list).
-
the pusher finds the pushee to be pending. In this case I believe we're fine too because the pushee will not attempt intent resolution in this case. Can you confirm?
pkg/storage/batcheval/cmd_end_transaction_test.go, line 1059 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Give this test a quick description.
I like the "Give" here. Good 👍
Done.
pkg/storage/batcheval/cmd_end_transaction_test.go, line 1084 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: the first write will always have seq #1, not #0.
Thanks, fixed.
pkg/storage/batcheval/cmd_end_transaction_test.go, line 1110 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Remove this.
I 💖 the imperative.
Done.
pkg/storage/batcheval/cmd_end_transaction_test.go, line 1110 at r8 (raw file):
// list is properly persisted in the stored transaction record. txnKey := keys.TransactionKey(txn.Key, txn.ID) if storeTxnBeforeEndTxn {
@nvanbenschoten this is new, PTAL.
pkg/storage/batcheval/cmd_resolve_intent_test.go, line 239 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Same comment about leaving a short description here.
Done.
pkg/storage/batcheval/transaction.go, line 156 at r8 (raw file):
// TxnMeta. Proceeding to KV reads or intent resolution without this // information would cause a partial rollback, if any, to be reverted // and yield inconsistent data.
@nvanbenschoten can you check this comment and confirm the following: a synthetized Transaction is currently never used to perform intent resolution or KV reads, or is it?
pkg/storage/engine/testdata/mvcc_histories/ignored_seq_nums, line 1 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
When are you planning to address this limitation?
This is lifted in #42582.
pkg/storage/intentresolver/intent_resolver.go, line 372 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This doesn't appear to be tested to me. We have a lot of low-level MVCC tests, but I think we also need some replica level tests that issue
EndTransaction
requests and verify that intents are partially rolled back as expected. These can be written either at thepkg/storage
level (seereplica_test.go
) or at thepkg/storage/batcheval
level (seecmd_end_transaction_test.go
).
Wasn't this precisely what the new test in cmd_end_transaction_test.go
is checking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@itsbilal there are still 3 points below for you to look into for #42582 when you come back. Thanks.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @nvanbenschoten)
pkg/storage/engine/mvcc.go, line 2821 at r6 (raw file):
Previously, knz (kena) wrote…
I think it does. I think the control path ensures the stats get adjusted properly in this case. Hopefully Bilal can double check.
@itsbilal can you double-check this again.
pkg/storage/engine/mvcc.go, line 2982 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Reminder that this still needs to be looked at.
@itsbilal can you review this.
pkg/storage/engine/mvcc.go, line 2775 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do we have any tests that try to resolve an intent with a PENDING status but where the current value has been collapsed? In other words, do we test the
inProgress && !pushed && rolledBackVal == nil && collapsedIntent
case?
I don't know. @itsbilal can you look into this? Thanks.
31f3c8d
to
c685bda
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @knz, and @nvanbenschoten)
pkg/storage/engine/mvcc.go, line 2821 at r6 (raw file):
Previously, knz (kena) wrote…
@itsbilal can you double-check this again.
Updated the control path in #42582 for the collapsedIntent case so that it now falls down to the intent abort case, where stats are also updated.
pkg/storage/engine/mvcc.go, line 2982 at r6 (raw file):
Previously, knz (kena) wrote…
@itsbilal can you review this.
This code is run before the MVCC value moves timestamps, so writing at the old timestamp (i.e. latestKey
) is fine. There was one caveat where the caller wasn't seeing restoredVal
, so (in #42582) I updated this code to return the restoredVal in that case.
pkg/storage/engine/mvcc.go, line 2775 at r7 (raw file):
Previously, knz (kena) wrote…
I don't know. @itsbilal can you look into this? Thanks.
Added a test for this in #42582, as well as one for collapsedIntent && pushed
which we weren't handling well earlier.
42582: storage: Update MVCC on {Rocks,Pebble} to support ignored seqnum ranges and rollbacks r=itsbilal a=itsbilal Builds on Raphael's PR #42152 and adds two commits, the second of which improves testing and the first of which adds the pebble MVCC scanner part of the ignore list functionality described in #41612, as well as restructures code in `mvccResolveWriteIntent` for better readability and to make it usable in the intent-history-cleanup-only case. Fixes #41612 . Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Bilal Akhtar <[email protected]>
c685bda
to
f6a4dc5
Compare
The Intent message type has been a frequent source of complexity and confusion. cockroachdb#42152/cockroachdb#42582 found this out first hand when it tried to add an IgnoredSeqNums field to the message type. This led to a series of questions that were not easily answered by simply reading about the message type, like: - why would an Intent ever have a non-PENDING status? - why would an Intent ever span multiple keys? - where are Intents created? - is this Intent message containing the authoritative state of the txn? - does this Intent message have a populated IgnoredSeqNums slice? The root problem was that the message type was serving two roles, as was documented in its comment. It both referenced on-disk write intents and served as a way to talk about updates that should be made to these write intents. This commit addresses this issue by splitting Intent into two (wire-compatible) messages with distinct purposes - Intent and LockUpdate. Intent's new sole purpose is now to reference persistent on-disk write intents on return paths of MVCC operations. They are only ever created when on-disk intents are encountered. LockUpdate's sole purpose is to represent updates that are intended for all locks held by the transaction within its span. They are only ever created after their transaction's record has been consulted, either after an EndTxn or a PushTxn request. This split simplifies the use of each of these types and helps make the questions above easier to answer or impossible to even ask (e.g. Intent no longer has a Status field). While here, the commit adds a `durability` field to LockUpdate. This is used to inform the message's recipient about the durability of the lock(s) it is being instructed to update. This will be important soon when cockroachdb#44976 formally introduces unreplicated locks to the key-value layer. I would suggest that reviewers start with the proto changes in `pkg/roachpb` and fan out from there. Everything else is essentially fallout from the typing change. This is still a bit of a WIP as there is more renaming necessary before this could be merged, but I want to get people's opinions on this sooner rather than later because it's partly on the critical path for cockroachdb#44976 and I'm feeling a little pressed for time with that change and the impending stability period.
The Intent message type has been a frequent source of complexity and confusion. cockroachdb#42152/cockroachdb#42582 found this out first hand when it tried to add an IgnoredSeqNums field to the message type. This led to a series of questions that were not easily answered by simply reading about the message type, like: - why would an Intent ever have a non-PENDING status? - why would an Intent ever span multiple keys? - where are Intents created? - is this Intent message containing the authoritative state of the txn? - does this Intent message have a populated IgnoredSeqNums slice? The root problem was that the message type was serving two roles, as was documented in its comment. It both referenced on-disk write intents and served as a way to talk about updates that should be made to these write intents. This commit addresses this issue by splitting Intent into two (wire-compatible) messages with distinct purposes - Intent and LockUpdate. Intent's new sole purpose is now to reference persistent on-disk write intents on return paths of MVCC operations. They are only ever created when on-disk intents are encountered. LockUpdate's sole purpose is to represent updates that are intended for all locks held by the transaction within its span. They are only ever created after their transaction's record has been consulted, either after an EndTxn or a PushTxn request. This split simplifies the use of each of these types and helps make the questions above easier to answer or impossible to even ask (e.g. Intent no longer has a Status field). While here, the commit adds a `durability` field to LockUpdate. This is used to inform the message's recipient about the durability of the lock(s) it is being instructed to update. This will be important soon when cockroachdb#44976 formally introduces unreplicated locks to the key-value layer. I would suggest that reviewers start with the proto changes in `pkg/roachpb` and fan out from there. Everything else is essentially fallout from the typing change. This is still a bit of a WIP as there is more renaming necessary before this could be merged, but I want to get people's opinions on this sooner rather than later because it's partly on the critical path for cockroachdb#44976 and I'm feeling a little pressed for time with that change and the impending stability period.
The Intent message type has been a frequent source of complexity and confusion. cockroachdb#42152/cockroachdb#42582 found this out first hand when it tried to add an IgnoredSeqNums field to the message type. This led to a series of questions that were not easily answered by simply reading about the message type, like: - why would an Intent ever have a non-PENDING status? - why would an Intent ever span multiple keys? - where are Intents created? - is this Intent message containing the authoritative state of the txn? - does this Intent message have a populated IgnoredSeqNums slice? The root problem was that the message type was serving two roles, as was documented in its comment. It both referenced on-disk write intents and served as a way to talk about updates that should be made to these write intents. This commit addresses this issue by splitting Intent into two (wire-compatible) messages with distinct purposes - Intent and LockUpdate. Intent's new sole purpose is now to reference persistent on-disk write intents on return paths of MVCC operations. They are only ever created when on-disk intents are encountered. LockUpdate's sole purpose is to represent updates that are intended for all locks held by the transaction within its span. They are only ever created after their transaction's record has been consulted, either after an EndTxn or a PushTxn request. This split simplifies the use of each of these types and helps make the questions above easier to answer or impossible to even ask (e.g. Intent no longer has a Status field). While here, the commit adds a `durability` field to LockUpdate. This is used to inform the message's recipient about the durability of the lock(s) it is being instructed to update. This will be important soon when cockroachdb#44976 formally introduces unreplicated locks to the key-value layer. I would suggest that reviewers start with the proto changes in `pkg/roachpb` and fan out from there. Everything else is essentially fallout from the typing change. This is still a bit of a WIP as there is more renaming necessary before this could be merged, but I want to get people's opinions on this sooner rather than later because it's partly on the critical path for cockroachdb#44976 and I'm feeling a little pressed for time with that change and the impending stability period.
45235: roachpb: split Intent proto message into Intent and LockUpdate messages r=nvanbenschoten a=nvanbenschoten The Intent message type has been a frequent source of complexity and confusion. @knz found this out first-hand in #42152/#42582 when he tried to add an `IgnoredSeqNums` field to the message type. This led to a series of questions that were not easily answered by simply reading about the message type, like: - why would an Intent ever have a non-PENDING status? - why would an Intent ever span multiple keys? - where are Intents created? - is this Intent message containing the authoritative state of the txn? - does this Intent message have a populated IgnoredSeqNums slice? The root problem was that the message type was serving two roles, as was documented in its comment. It both referenced on-disk write intents and served as a way to talk about updates that should be made to these write intents. This commit addresses this issue by splitting Intent into two (wire-compatible) messages with distinct purposes - Intent and LockUpdate. Intent's new sole purpose is now to reference persistent on-disk write intents on return paths of MVCC operations. They are only ever created when on-disk intents are encountered. LockUpdate's sole purpose is to represent updates that are intended for all locks held by the transaction within its span. They are only ever created after their transaction's record has been consulted, either after an EndTxn or a PushTxn request. This split simplifies the use of each of these types and helps make the questions above easier to answer or impossible to even ask (e.g. Intent no longer has a Status field). While here, the commit adds a `durability` field to LockUpdate. This is used to inform the message's recipient about the durability of the lock(s) it is being instructed to update. This will be important soon when #44976 formally introduces unreplicated locks to the key-value layer. I would suggest that reviewers start with the proto changes in `pkg/roachpb` and fan-out from there. Everything else is essentially fallout from the typing change. This is still a bit of a WIP as there is more renaming necessary before this could be merged, but I want to get people's opinions on this sooner rather than later because it's partly on the critical path for #44976 and I'm feeling a little pressed for time with that change and the impending stability period. Co-authored-by: Nathan VanBenschoten <[email protected]>
Fixes #41612. (At least, aims to)
The MVCC code already had rudimentary understanding of sequence
numbers to allow reads to ignore writes at greater seqnums.
To implement SQL savepoint rollbacks, we must also support
ignoring writes that fall in ignored ranges of seqnums.
To achieve this, this commit extends the
mvccScanner
forRocksDB (Pebble code remains to be done) to account for
ignored seqnum ranges, and also extends
MVCCResolveWriteIntent
to collapse an intent to the last write that has not been marked
to be ignored by a savepoint rollback.
Release note: None