Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage/concurrency: prep lock-table for 20.1 release #44976

Closed
29 of 35 tasks
nvanbenschoten opened this issue Feb 11, 2020 · 1 comment
Closed
29 of 35 tasks

storage/concurrency: prep lock-table for 20.1 release #44976

nvanbenschoten opened this issue Feb 11, 2020 · 1 comment
Assignees
Labels
A-kv-transactions Relating to MVCC and the transactional model. T-kv KV Team
Milestone

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Feb 11, 2020

Release 20.1 will contain a new approach to kv-level transaction locking. Instead of storing all locks inline in the MVCC keyspace and finding locks during request evaluation, locks will now be maintained in a separate lock-table. While replicated locks (intents) will not be pulled into a persistent lock-table keyspace during this release, the lock-table structure will track these locks, as well as unreplicated (best-effort) locks which enable SELECT FOR UPDATE (see #40205). Additionally, the lock-table structure will lead to significant improvements in transaction queuing. The lock-table's lock wait-queues will replace the current contentionQueue, leading to improved fairness between transactions and reduced latency now that the lock wait-queues react directly to intent resolution. For more on this change, see #41720.

The rest of this issue enumerates the remaining work items for the lock-table and storage/concurrency package that should get into this release. It does not include any work items that are specific to SELECT FOR UPDATE's use of the storage/concurrency package.

Must Have

Nice To Have

  • use txn priorities when ordering requests to avoid higher priority request waiting on a lower priority reservation holder @sumeerbhola
  • benchmark comparison between lazy waiting (current) and eager waiting in contended workloads @sumeerbhola
  • re-introduce invariant that reservation holder is always the request that acquires locks
  • distinguish in lockTableGuardImpl between waiting on lock holder and waiting on queued requests storage/concurrency: implement lockTableWaiter #44885
  • add visibility into state of lockTable through metrics and/or debug pages: kv/concurrency: expose lock table counters, introduce timeseries metrics #67350
  • simplify state transitions in acquireLock by not removing from lock holder's lockTableGuardImpl.mu.locks
  • eliminate waiting details except when waiting on lock holder, and do related code simplification @sumeerbhola

Feel free to modify. Please check off when changes merge.

@nvanbenschoten nvanbenschoten added the A-kv-transactions Relating to MVCC and the transactional model. label Feb 11, 2020
@nvanbenschoten nvanbenschoten added this to the 20.1 milestone Feb 11, 2020
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Feb 13, 2020
Informs cockroachdb#41720.
Informs cockroachdb#44976.

This PR implements the concurrency.Manager interface, which is the
core structure that ties together the new concurrency package.

The concurrency manager is a structure that sequences incoming requests
and provides isolation between requests that intend to perform
conflicting operations. During sequencing, conflicts are discovered and
any found are resolved through a combination of passive queuing and
active pushing. Once a request has been sequenced, it is free to
evaluate without concerns of conflicting with other in-flight requests
due to the isolation provided by the manager. This isolation is
guaranteed for the lifetime of the request but terminates once the
request completes.

The manager accomplishes this by piecing together the following components
in its request sequencing path:
- `latchManager`
- `lockTable`
- `lockTableWaiter`
- `txnWaitQueue`

The largest part of this change is introducing the datadriven testing
framework to deterministically test the concurrency manager. This proved
difficult for two reasons:
1. the concurrency manager composes several components to perform it
   work (latchManager, lockTable, lockTableWaiter, txnWaitQueue). It was
   difficult to get consistent observability into each of these components
   in such a way that tests could be run against a set of concurrent requests
   interacting with them all.
2. the concurrency manager exposes a primarily blocking interface. Requests
   call `Sequence()` and wait for sequencing to complete. This may block in
   a number of different places - while waiting on latches, while waiting on
   locks, and while waiting on other transactions. The most important part
   of these tests is to assert _where_ a given request blocks based on the
   current state of the concurrency manager and then assert _how_ the request
   reacts to a state transition by another request.

To address the first problem, the testing harness uses the context-carried
tracing infrastructure to track the path of a request. We already had log
events scattered throughout these various components, so this did not require
digging testing hooks into each of them. Instead, the harness attached a
trace recording span to each request and watches as events are added to the
span. It then uses these events as the output of the text.

To address the second problem, the testing harness introduces a monitor
object which manages a collection of "monitored" goroutines. The monitor
watches as these goroutines run and keeps track of their goroutine state as
is reported by a goroutine dump. During each step of the datadriven test, the
monitor allows all goroutines to proceed until they have either terminated or
stalled due to cross-goroutine synchronization dependencies. For instance, it
waits for all goroutines to stall while receiving from channels. We can be
sure that the goroutine dump provides a consistent snapshot of all goroutine
states and statuses because `runtime.Stack(all=true)` stops the world when
called. This means that when all monitored goroutines are simultaneously
stalled, we have a deadlock that can only be resolved by proceeding forward
with the test and releasing latches, resolving locks, or committing
transactions. This structure worked surprisingly well and has held up to
long periods of stressrace.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Feb 14, 2020
Informs cockroachdb#41720.
Informs cockroachdb#44976.

This PR implements the concurrency.Manager interface, which is the
core structure that ties together the new concurrency package.

The concurrency manager is a structure that sequences incoming requests
and provides isolation between requests that intend to perform
conflicting operations. During sequencing, conflicts are discovered and
any found are resolved through a combination of passive queuing and
active pushing. Once a request has been sequenced, it is free to
evaluate without concerns of conflicting with other in-flight requests
due to the isolation provided by the manager. This isolation is
guaranteed for the lifetime of the request but terminates once the
request completes.

The manager accomplishes this by piecing together the following components
in its request sequencing path:
- `latchManager`
- `lockTable`
- `lockTableWaiter`
- `txnWaitQueue`

The largest part of this change is introducing the datadriven testing
framework to deterministically test the concurrency manager. This proved
difficult for two reasons:
1. the concurrency manager composes several components to perform it
   work (latchManager, lockTable, lockTableWaiter, txnWaitQueue). It was
   difficult to get consistent observability into each of these components
   in such a way that tests could be run against a set of concurrent requests
   interacting with them all.
2. the concurrency manager exposes a primarily blocking interface. Requests
   call `Sequence()` and wait for sequencing to complete. This may block in
   a number of different places - while waiting on latches, while waiting on
   locks, and while waiting on other transactions. The most important part
   of these tests is to assert _where_ a given request blocks based on the
   current state of the concurrency manager and then assert _how_ the request
   reacts to a state transition by another request.

To address the first problem, the testing harness uses the context-carried
tracing infrastructure to track the path of a request. We already had log
events scattered throughout these various components, so this did not require
digging testing hooks into each of them. Instead, the harness attached a
trace recording span to each request and watches as events are added to the
span. It then uses these events as the output of the text.

To address the second problem, the testing harness introduces a monitor
object which manages a collection of "monitored" goroutines. The monitor
watches as these goroutines run and keeps track of their goroutine state as
is reported by a goroutine dump. During each step of the datadriven test, the
monitor allows all goroutines to proceed until they have either terminated or
stalled due to cross-goroutine synchronization dependencies. For instance, it
waits for all goroutines to stall while receiving from channels. We can be
sure that the goroutine dump provides a consistent snapshot of all goroutine
states and statuses because `runtime.Stack(all=true)` stops the world when
called. This means that when all monitored goroutines are simultaneously
stalled, we have a deadlock that can only be resolved by proceeding forward
with the test and releasing latches, resolving locks, or committing
transactions. This structure worked surprisingly well and has held up to
long periods of stressrace.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Feb 18, 2020
Informs cockroachdb#41720.
Informs cockroachdb#44976.

This PR implements the concurrency.Manager interface, which is the
core structure that ties together the new concurrency package.

The concurrency manager is a structure that sequences incoming requests
and provides isolation between requests that intend to perform
conflicting operations. During sequencing, conflicts are discovered and
any found are resolved through a combination of passive queuing and
active pushing. Once a request has been sequenced, it is free to
evaluate without concerns of conflicting with other in-flight requests
due to the isolation provided by the manager. This isolation is
guaranteed for the lifetime of the request but terminates once the
request completes.

The manager accomplishes this by piecing together the following components
in its request sequencing path:
- `latchManager`
- `lockTable`
- `lockTableWaiter`
- `txnWaitQueue`

The largest part of this change is introducing the datadriven testing
framework to deterministically test the concurrency manager. This proved
difficult for two reasons:
1. the concurrency manager composes several components to perform it
   work (latchManager, lockTable, lockTableWaiter, txnWaitQueue). It was
   difficult to get consistent observability into each of these components
   in such a way that tests could be run against a set of concurrent requests
   interacting with them all.
2. the concurrency manager exposes a primarily blocking interface. Requests
   call `Sequence()` and wait for sequencing to complete. This may block in
   a number of different places - while waiting on latches, while waiting on
   locks, and while waiting on other transactions. The most important part
   of these tests is to assert _where_ a given request blocks based on the
   current state of the concurrency manager and then assert _how_ the request
   reacts to a state transition by another request.

To address the first problem, the testing harness uses the context-carried
tracing infrastructure to track the path of a request. We already had log
events scattered throughout these various components, so this did not require
digging testing hooks into each of them. Instead, the harness attached a
trace recording span to each request and watches as events are added to the
span. It then uses these events as the output of the text.

To address the second problem, the testing harness introduces a monitor
object which manages a collection of "monitored" goroutines. The monitor
watches as these goroutines run and keeps track of their goroutine state as
is reported by a goroutine dump. During each step of the datadriven test, the
monitor allows all goroutines to proceed until they have either terminated or
stalled due to cross-goroutine synchronization dependencies. For instance, it
waits for all goroutines to stall while receiving from channels. We can be
sure that the goroutine dump provides a consistent snapshot of all goroutine
states and statuses because `runtime.Stack(all=true)` stops the world when
called. This means that when all monitored goroutines are simultaneously
stalled, we have a deadlock that can only be resolved by proceeding forward
with the test and releasing latches, resolving locks, or committing
transactions. This structure worked surprisingly well and has held up to
long periods of stressrace.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Feb 18, 2020
Informs cockroachdb#41720.
Informs cockroachdb#44976.

This PR implements the concurrency.Manager interface, which is the
core structure that ties together the new concurrency package.

The concurrency manager is a structure that sequences incoming requests
and provides isolation between requests that intend to perform
conflicting operations. During sequencing, conflicts are discovered and
any found are resolved through a combination of passive queuing and
active pushing. Once a request has been sequenced, it is free to
evaluate without concerns of conflicting with other in-flight requests
due to the isolation provided by the manager. This isolation is
guaranteed for the lifetime of the request but terminates once the
request completes.

The manager accomplishes this by piecing together the following components
in its request sequencing path:
- `latchManager`
- `lockTable`
- `lockTableWaiter`
- `txnWaitQueue`

The largest part of this change is introducing the datadriven testing
framework to deterministically test the concurrency manager. This proved
difficult for two reasons:
1. the concurrency manager composes several components to perform it
   work (latchManager, lockTable, lockTableWaiter, txnWaitQueue). It was
   difficult to get consistent observability into each of these components
   in such a way that tests could be run against a set of concurrent requests
   interacting with them all.
2. the concurrency manager exposes a primarily blocking interface. Requests
   call `Sequence()` and wait for sequencing to complete. This may block in
   a number of different places - while waiting on latches, while waiting on
   locks, and while waiting on other transactions. The most important part
   of these tests is to assert _where_ a given request blocks based on the
   current state of the concurrency manager and then assert _how_ the request
   reacts to a state transition by another request.

To address the first problem, the testing harness uses the context-carried
tracing infrastructure to track the path of a request. We already had log
events scattered throughout these various components, so this did not require
digging testing hooks into each of them. Instead, the harness attached a
trace recording span to each request and watches as events are added to the
span. It then uses these events as the output of the text.

To address the second problem, the testing harness introduces a monitor
object which manages a collection of "monitored" goroutines. The monitor
watches as these goroutines run and keeps track of their goroutine state as
is reported by a goroutine dump. During each step of the datadriven test, the
monitor allows all goroutines to proceed until they have either terminated or
stalled due to cross-goroutine synchronization dependencies. For instance, it
waits for all goroutines to stall while receiving from channels. We can be
sure that the goroutine dump provides a consistent snapshot of all goroutine
states and statuses because `runtime.Stack(all=true)` stops the world when
called. This means that when all monitored goroutines are simultaneously
stalled, we have a deadlock that can only be resolved by proceeding forward
with the test and releasing latches, resolving locks, or committing
transactions. This structure worked surprisingly well and has held up to
long periods of stressrace.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Feb 18, 2020
Informs cockroachdb#41720.
Informs cockroachdb#44976.

This PR implements the concurrency.Manager interface, which is the
core structure that ties together the new concurrency package.

The concurrency manager is a structure that sequences incoming requests
and provides isolation between requests that intend to perform
conflicting operations. During sequencing, conflicts are discovered and
any found are resolved through a combination of passive queuing and
active pushing. Once a request has been sequenced, it is free to
evaluate without concerns of conflicting with other in-flight requests
due to the isolation provided by the manager. This isolation is
guaranteed for the lifetime of the request but terminates once the
request completes.

The manager accomplishes this by piecing together the following components
in its request sequencing path:
- `latchManager`
- `lockTable`
- `lockTableWaiter`
- `txnWaitQueue`

The largest part of this change is introducing the datadriven testing
framework to deterministically test the concurrency manager. This proved
difficult for two reasons:
1. the concurrency manager composes several components to perform it
   work (latchManager, lockTable, lockTableWaiter, txnWaitQueue). It was
   difficult to get consistent observability into each of these components
   in such a way that tests could be run against a set of concurrent requests
   interacting with them all.
2. the concurrency manager exposes a primarily blocking interface. Requests
   call `Sequence()` and wait for sequencing to complete. This may block in
   a number of different places - while waiting on latches, while waiting on
   locks, and while waiting on other transactions. The most important part
   of these tests is to assert _where_ a given request blocks based on the
   current state of the concurrency manager and then assert _how_ the request
   reacts to a state transition by another request.

To address the first problem, the testing harness uses the context-carried
tracing infrastructure to track the path of a request. We already had log
events scattered throughout these various components, so this did not require
digging testing hooks into each of them. Instead, the harness attached a
trace recording span to each request and watches as events are added to the
span. It then uses these events as the output of the text.

To address the second problem, the testing harness introduces a monitor
object which manages a collection of "monitored" goroutines. The monitor
watches as these goroutines run and keeps track of their goroutine state as
is reported by a goroutine dump. During each step of the datadriven test, the
monitor allows all goroutines to proceed until they have either terminated or
stalled due to cross-goroutine synchronization dependencies. For instance, it
waits for all goroutines to stall while receiving from channels. We can be
sure that the goroutine dump provides a consistent snapshot of all goroutine
states and statuses because `runtime.Stack(all=true)` stops the world when
called. This means that when all monitored goroutines are simultaneously
stalled, we have a deadlock that can only be resolved by proceeding forward
with the test and releasing latches, resolving locks, or committing
transactions. This structure worked surprisingly well and has held up to
long periods of stressrace.
craig bot pushed a commit that referenced this issue Feb 19, 2020
45062: storage/concurrency: implement concurrency Manager r=nvanbenschoten a=nvanbenschoten

Informs #41720.
Informs #44976.
Initially drawn out in #43775.

This PR implements the concurrency.Manager interface, which is the core structure that ties together the new concurrency package. 

The concurrency manager is a structure that sequences incoming requests and provides isolation between requests that intend to perform conflicting operations. During sequencing, conflicts are discovered and any found are resolved through a combination of passive queuing and active pushing. Once a request has been sequenced, it is free to evaluate without concerns of conflicting with other in-flight requests due to the isolation provided by the manager. This isolation is guaranteed for the lifetime of the request but terminates once the request completes.

The manager accomplishes this by piecing together the following components in its request sequencing path:
- `latchManager`
- `lockTable`
- `lockTableWaiter`
- `txnWaitQueue`

The largest part of this change is introducing the datadriven testing framework to deterministically test the concurrency manager. This proved difficult for two reasons:
1. the concurrency manager composes several components to perform it work (latchManager, lockTable, lockTableWaiter, txnWaitQueue). It was difficult to get consistent observability into each of these components in such a way that tests could be run against a set of concurrent requests interacting with them all.
2. the concurrency manager exposes a primarily blocking interface. Requests call `Sequence()` and wait for sequencing to complete. This may block in a number of different places - while waiting on latches, while waiting on locks, and while waiting on other transactions. The most important part of these tests is to assert _where_ a given request blocks based on the current state of the concurrency manager and then assert _how_ the request reacts to a state transition by another request.

To address the first problem, the testing harness uses the context-carried tracing infrastructure to track the path of a request. We already had log events scattered throughout these various components, so this did not require digging testing hooks into each of them. Instead, the harness attached a trace recording span to each request and watches as events are added to the span. It then uses these events as the output of the text.

To address the second problem, the testing harness introduces a monitor object which manages a collection of "monitored" goroutines. The monitor watches as these goroutines run and keeps track of their goroutine state as is reported by a goroutine dump. During each step of the datadriven test, the monitor allows all goroutines to proceed until they have either terminated or stalled due to cross-goroutine synchronization dependencies. For instance, it waits for all goroutines to stall while receiving from channels. We can be sure that the goroutine dump provides a consistent snapshot of all goroutine states and statuses because `runtime.Stack(all=true)` stops the world when called. This means that when all monitored goroutines are simultaneously stalled, we have a deadlock that can only be resolved by proceeding forward with the test and releasing latches, resolving locks, or committing transactions. This structure worked surprisingly well and has held up to long periods of stressrace.

Co-authored-by: Nathan VanBenschoten <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Feb 20, 2020
The Intent message type has been a frequent source of complexity and
confusion. cockroachdb#42152/cockroachdb#42582 found this out first hand when it tried to
add an IgnoredSeqNums field to the message type. This led to a series
of questions that were not easily answered by simply reading about
the message type, like:
- why would an Intent ever have a non-PENDING status?
- why would an Intent ever span multiple keys?
- where are Intents created?
- is this Intent message containing the authoritative state of the txn?
- does this Intent message have a populated IgnoredSeqNums slice?

The root problem was that the message type was serving two roles, as was
documented in its comment. It both referenced on-disk write intents and
served as a way to talk about updates that should be made to these write
intents.

This commit addresses this issue by splitting Intent into two
(wire-compatible) messages with distinct purposes - Intent and LockUpdate.
Intent's new sole purpose is now to reference persistent on-disk write
intents on return paths of MVCC operations. They are only ever created
when on-disk intents are encountered. LockUpdate's sole purpose is to
represent updates that are intended for all locks held by the transaction
within its span. They are only ever created after their transaction's
record has been consulted, either after an EndTxn or a PushTxn request.
This split simplifies the use of each of these types and helps make
the questions above easier to answer or impossible to even ask (e.g.
Intent no longer has a Status field).

While here, the commit adds a `durability` field to LockUpdate. This is
used to inform the message's recipient about the durability of the
lock(s) it is being instructed to update. This will be important soon
when cockroachdb#44976 formally introduces unreplicated locks to the key-value
layer.

I would suggest that reviewers start with the proto changes in `pkg/roachpb`
and fan out from there. Everything else is essentially fallout from the
typing change. This is still a bit of a WIP as there is more renaming
necessary before this could be merged, but I want to get people's opinions
on this sooner rather than later because it's partly on the critical path
for cockroachdb#44976 and I'm feeling a little pressed for time with that change and
the impending stability period.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Feb 20, 2020
The Intent message type has been a frequent source of complexity and
confusion. cockroachdb#42152/cockroachdb#42582 found this out first hand when it tried to
add an IgnoredSeqNums field to the message type. This led to a series
of questions that were not easily answered by simply reading about
the message type, like:
- why would an Intent ever have a non-PENDING status?
- why would an Intent ever span multiple keys?
- where are Intents created?
- is this Intent message containing the authoritative state of the txn?
- does this Intent message have a populated IgnoredSeqNums slice?

The root problem was that the message type was serving two roles, as was
documented in its comment. It both referenced on-disk write intents and
served as a way to talk about updates that should be made to these write
intents.

This commit addresses this issue by splitting Intent into two
(wire-compatible) messages with distinct purposes - Intent and LockUpdate.
Intent's new sole purpose is now to reference persistent on-disk write
intents on return paths of MVCC operations. They are only ever created
when on-disk intents are encountered. LockUpdate's sole purpose is to
represent updates that are intended for all locks held by the transaction
within its span. They are only ever created after their transaction's
record has been consulted, either after an EndTxn or a PushTxn request.
This split simplifies the use of each of these types and helps make
the questions above easier to answer or impossible to even ask (e.g.
Intent no longer has a Status field).

While here, the commit adds a `durability` field to LockUpdate. This is
used to inform the message's recipient about the durability of the
lock(s) it is being instructed to update. This will be important soon
when cockroachdb#44976 formally introduces unreplicated locks to the key-value
layer.

I would suggest that reviewers start with the proto changes in `pkg/roachpb`
and fan out from there. Everything else is essentially fallout from the
typing change. This is still a bit of a WIP as there is more renaming
necessary before this could be merged, but I want to get people's opinions
on this sooner rather than later because it's partly on the critical path
for cockroachdb#44976 and I'm feeling a little pressed for time with that change and
the impending stability period.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Feb 21, 2020
The Intent message type has been a frequent source of complexity and
confusion. cockroachdb#42152/cockroachdb#42582 found this out first hand when it tried to
add an IgnoredSeqNums field to the message type. This led to a series
of questions that were not easily answered by simply reading about
the message type, like:
- why would an Intent ever have a non-PENDING status?
- why would an Intent ever span multiple keys?
- where are Intents created?
- is this Intent message containing the authoritative state of the txn?
- does this Intent message have a populated IgnoredSeqNums slice?

The root problem was that the message type was serving two roles, as was
documented in its comment. It both referenced on-disk write intents and
served as a way to talk about updates that should be made to these write
intents.

This commit addresses this issue by splitting Intent into two
(wire-compatible) messages with distinct purposes - Intent and LockUpdate.
Intent's new sole purpose is now to reference persistent on-disk write
intents on return paths of MVCC operations. They are only ever created
when on-disk intents are encountered. LockUpdate's sole purpose is to
represent updates that are intended for all locks held by the transaction
within its span. They are only ever created after their transaction's
record has been consulted, either after an EndTxn or a PushTxn request.
This split simplifies the use of each of these types and helps make
the questions above easier to answer or impossible to even ask (e.g.
Intent no longer has a Status field).

While here, the commit adds a `durability` field to LockUpdate. This is
used to inform the message's recipient about the durability of the
lock(s) it is being instructed to update. This will be important soon
when cockroachdb#44976 formally introduces unreplicated locks to the key-value
layer.

I would suggest that reviewers start with the proto changes in `pkg/roachpb`
and fan out from there. Everything else is essentially fallout from the
typing change. This is still a bit of a WIP as there is more renaming
necessary before this could be merged, but I want to get people's opinions
on this sooner rather than later because it's partly on the critical path
for cockroachdb#44976 and I'm feeling a little pressed for time with that change and
the impending stability period.
craig bot pushed a commit that referenced this issue Feb 21, 2020
45235: roachpb: split Intent proto message into Intent and LockUpdate messages r=nvanbenschoten a=nvanbenschoten

The Intent message type has been a frequent source of complexity and confusion. @knz found this out first-hand in #42152/#42582 when he tried to add an `IgnoredSeqNums` field to the message type. This led to a series of questions that were not easily answered by simply reading about the message type, like:
- why would an Intent ever have a non-PENDING status?
- why would an Intent ever span multiple keys?
- where are Intents created?
- is this Intent message containing the authoritative state of the txn?
- does this Intent message have a populated IgnoredSeqNums slice?

The root problem was that the message type was serving two roles, as was documented in its comment. It both referenced on-disk write intents and served as a way to talk about updates that should be made to these write intents.

This commit addresses this issue by splitting Intent into two (wire-compatible) messages with distinct purposes - Intent and LockUpdate. Intent's new sole purpose is now to reference persistent on-disk write intents on return paths of MVCC operations. They are only ever created when on-disk intents are encountered. LockUpdate's sole purpose is to represent updates that are intended for all locks held by the transaction within its span. They are only ever created after their transaction's record has been consulted, either after an EndTxn or a PushTxn request. This split simplifies the use of each of these types and helps make the questions above easier to answer or impossible to even ask (e.g. Intent no longer has a Status field).

While here, the commit adds a `durability` field to LockUpdate. This is used to inform the message's recipient about the durability of the lock(s) it is being instructed to update. This will be important soon when #44976 formally introduces unreplicated locks to the key-value layer.

I would suggest that reviewers start with the proto changes in `pkg/roachpb` and fan-out from there. Everything else is essentially fallout from the typing change. This is still a bit of a WIP as there is more renaming necessary before this could be merged, but I want to get people's opinions on this sooner rather than later because it's partly on the critical path for #44976 and I'm feeling a little pressed for time with that change and the impending stability period.

Co-authored-by: Nathan VanBenschoten <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Feb 27, 2020
Related to cockroachdb#41720.
Related to cockroachdb#44976.

This commit integrates the new concurrency package into the storage
package. Each Replica is given a concurrency manager, which replaces
its existing latch manager and txn wait queue. The change also uses
the concurrency manager to simplify the role of the intent resolver.
The intent resolver no longer directly handles WriteIntentErrors. As
a result, we are able to delete the contention queue entirely.

With this change, all requests are now sequenced through the concurrency
manager. When sequencing, latches are acquired and conflicting locks are
detected. If any locks are found, the requests wait in lock wait-queues
for the locks to be resolved. This is a major deviation from how things
currently work because today, even with the contention queue, requests
end up waiting for conflicting transactions to commit/abort in the
txnWaitQueue after at least one RPC. Now, requests wait directly next
to the intents/locks that they are waiting on and react directly to the
resolution of these intents/locks.

Once requests are sequenced by the concurrency manager, they are
theoretically fully isolated from all conflicting requests. However,
this is not strictly true today because we have not yet pulled all
replicated locks into the concurrency manager's lock table. We will
do so in a future change. Until then, the concurrency manager maintains
a notion of "intent discovery", which is integrated into the Replica-level
concurrency retry loop.

Performance numbers will be published shortly. This will be followed
by performance numbers using the SELECT FOR UPDATE locking (cockroachdb#40205)
improvements that this change enables.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Feb 28, 2020
Related to cockroachdb#41720.
Related to cockroachdb#44976.

This commit integrates the new concurrency package into the storage
package. Each Replica is given a concurrency manager, which replaces
its existing latch manager and txn wait queue. The change also uses
the concurrency manager to simplify the role of the intent resolver.
The intent resolver no longer directly handles WriteIntentErrors. As
a result, we are able to delete the contention queue entirely.

With this change, all requests are now sequenced through the concurrency
manager. When sequencing, latches are acquired and conflicting locks are
detected. If any locks are found, the requests wait in lock wait-queues
for the locks to be resolved. This is a major deviation from how things
currently work because today, even with the contention queue, requests
end up waiting for conflicting transactions to commit/abort in the
txnWaitQueue after at least one RPC. Now, requests wait directly next
to the intents/locks that they are waiting on and react directly to the
resolution of these intents/locks.

Once requests are sequenced by the concurrency manager, they are
theoretically fully isolated from all conflicting requests. However,
this is not strictly true today because we have not yet pulled all
replicated locks into the concurrency manager's lock table. We will
do so in a future change. Until then, the concurrency manager maintains
a notion of "intent discovery", which is integrated into the Replica-level
concurrency retry loop.

Performance numbers will be published shortly. This will be followed
by performance numbers using the SELECT FOR UPDATE locking (cockroachdb#40205)
improvements that this change enables.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Mar 2, 2020
Related to cockroachdb#41720.
Related to cockroachdb#44976.

This commit integrates the new concurrency package into the storage
package. Each Replica is given a concurrency manager, which replaces
its existing latch manager and txn wait queue. The change also uses
the concurrency manager to simplify the role of the intent resolver.
The intent resolver no longer directly handles WriteIntentErrors. As
a result, we are able to delete the contention queue entirely.

With this change, all requests are now sequenced through the concurrency
manager. When sequencing, latches are acquired and conflicting locks are
detected. If any locks are found, the requests wait in lock wait-queues
for the locks to be resolved. This is a major deviation from how things
currently work because today, even with the contention queue, requests
end up waiting for conflicting transactions to commit/abort in the
txnWaitQueue after at least one RPC. Now, requests wait directly next
to the intents/locks that they are waiting on and react directly to the
resolution of these intents/locks.

Once requests are sequenced by the concurrency manager, they are
theoretically fully isolated from all conflicting requests. However,
this is not strictly true today because we have not yet pulled all
replicated locks into the concurrency manager's lock table. We will
do so in a future change. Until then, the concurrency manager maintains
a notion of "intent discovery", which is integrated into the Replica-level
concurrency retry loop.

Performance numbers will be published shortly. This will be followed
by performance numbers using the SELECT FOR UPDATE locking (cockroachdb#40205)
improvements that this change enables.
craig bot pushed a commit that referenced this issue Mar 2, 2020
45269: importccl: Parallelize avro import r=miretskiy a=miretskiy

Parallelize avro importer to improve its throughput (2.8x improvement).

Touches #40374.
Fixes #45097.

Release notes (performance): Faster avro import

45482: storage: integrate Concurrency Manager into Replica request path r=nvanbenschoten a=nvanbenschoten

Related to #41720.
Related to #44976.

This commit integrates the new concurrency package into the storage package. Each Replica is given a concurrency manager, which replaces its existing latch manager and txn wait queue. The change also uses the concurrency manager to simplify the role of the intent resolver. The intent resolver no longer directly handles WriteIntentErrors. As a result, we are able to delete the contention queue entirely.

With this change, all requests are now sequenced through the concurrency manager. When sequencing, latches are acquired and conflicting locks are detected. If any locks are found, the requests wait in lock wait-queues for the locks to be resolved. This is a major deviation from how things currently work because today, even with the contention queue, requests end up waiting for conflicting transactions to commit/abort in the txnWaitQueue after at least one RPC. Now, requests wait directly next to the intents/locks that they are waiting on and react directly to the resolution of these intents/locks.

Once requests are sequenced by the concurrency manager, they are theoretically fully isolated from all conflicting requests. However, this is not strictly true today because we have not yet pulled all replicated locks into the concurrency manager's lock table. We will do so in a future change. Until then, the concurrency manager maintains a notion of "intent discovery", which is integrated into the Replica-level concurrency retry loop.

Performance numbers will be published shortly. This will be followed by performance numbers using the SELECT FOR UPDATE locking (#40205) improvements that this change enables.

45484: sql: simplify connection state machine - stop tracking retry intent r=andreimatei a=andreimatei

Before this patch, the SQL connection state machine had an optimization:
if a transaction that hadn't used "SAVEPOINT cockroach_restart"
encountered a retriable error that we can't auto-retry, then we'd
release the txn's locks eagerly and enter the Aborted state. As opposed
to transactions that had used the "SAVEPOINT cockroach_restart", which
go to RestartWait.
This optimization is a significant complication for the state machine,
so this patch is removing it. All transactions now go to RestartWait,
and wait for a ROLLBACK to release the locks.

On the flip side, doing "RELEASE SAVEPOINT cockroach_restart" and
"ROLLBACK SAVEPOINT cockroach_restart" now works even for transactions
that haven't explicitly declared that savepoint, which is nice. Although
I don't promise I'll keep it working.

Release note: None

Co-authored-by: Yevgeniy Miretskiy <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Andrei Matei <[email protected]>
@jlinder jlinder added the T-kv KV Team label Jun 16, 2021
@nvanbenschoten
Copy link
Member Author

Closing. This issue served us well, but it is now stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-transactions Relating to MVCC and the transactional model. T-kv KV Team
Projects
None yet
Development

No branches or pull requests

3 participants