Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: support unreplicated locks, hook up SELECT FOR UPDATE end-to-end #45701

Merged

Conversation

nvanbenschoten
Copy link
Member

@nvanbenschoten nvanbenschoten commented Mar 4, 2020

Closes #40205.
Closes #43775.
Informs #41720.

This change teaches the KV client and the KV API about unreplicated locks. It then adds a KeyLocking mode to ScanRequest and ReverseScanRequest, which allows their users to select the locking strength that they would like the scan to use. This locking strength defaults to None, which corresponds to the current behavior. However, some users will want to acquire locks on each row scanned, which is now possible by setting the locking strength to a stronger level. For now, only the Exclusive strength is supported.

The change then revisits SQL's row-level locking support, which is supported all the way down to the row fetcher for implicit (e.g. UPDATE) and explicit (e.g. SELECT ... FOR UPDATE) upgrade locking. The change uses the new key-locking functionality in the KV API to hook up row-level locking, completing the integration of SELECT FOR UPDATE with the KV layer and, in particular, the new lock-table structure.

#43775 described the three main benefits of this change:

  • higher throughput under contention
  • lower latency and improved fairness under contention
  • a reduction in transaction retries under contention

I've revisited those results a few times in the last two months and seen that the results continue to hold, and in some cases they have improved. I intend to update this PR with a more complete analysis of its impact on those three areas.

Release note (sql change): SELECT FOR UPDATE now hooks into a new leaseholder-only locking mechanism. This allows the feature to be used to improve performance of transactions that read, modify, and write contended to rows. Similarly, UPDATE statements now use this new mechanism by default, meaning that their performance under contention is improved. This is only enabled for UPDATE statements that can push their filter all the way into their key-value scan. To determine whether an UPDATE statement is implicitly using SELECT FOR UPDATE locking, look for a "locking strength" field in the EXPLAIN output for the statement.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)


pkg/internal/client/batch.go, line 557 at r6 (raw file):

// each of the returned keys.
//
// A new result will be appended to the batch which will contain  "rows" (each

double space


pkg/internal/client/db.go, line 419 at r6 (raw file):

	maxRows int64,
	isReverse bool,
	forUpdate bool,

at least define forUpdate = true, notForUpdate = false constants.


pkg/kv/txn_interceptor_pipeliner.go, line 184 at r6 (raw file):

	// The transaction's lock footprint contains spans where locks have been
	// acquired at some point by the transaction. The span set contains spans
	// encompassing the lock spans from all writes that have already been proven

please add the word "intent" in here somewhere and make it clear that the footprint contains both replicated and unreplicated locks.


pkg/storage/engine/mvcc.go, line 2734 at r5 (raw file):

				intent.Txn.WriteTimestamp, false, unsafeValue, nil, gbuf)
			if err != nil {
				log.Warningf(ctx, "unable to find value for %s @ %s: %v ",

I'm glad to see this warning gone.

@nvanbenschoten
Copy link
Member Author

@rytaft do you mind signing off on the sql: move calls to planner.maybeSetSystemConfig into execFactory commit? This relates to the discussion we were having on slack.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sql commit LGTM

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to pure Github review for the rest, Reviewable keeps showing me things I've already seen, probably the rebase.

Reviewed 80 of 80 files at r7, 5 of 5 files at r8, 15 of 15 files at r9, 39 of 39 files at r10, 14 of 14 files at r11, 7 of 7 files at r12.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)


pkg/internal/client/db.go, line 419 at r6 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

at least define forUpdate = true, notForUpdate = false constants.

Also curious why this is called "forUpdate", which is a SQL concept, while this is the KV API. lock lock.Mode comes to mind.

No need to do anything here now, just interested in the conversation.


pkg/kv/txn_interceptor_pipeliner.go, line 184 at r6 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please add the word "intent" in here somewhere and make it clear that the footprint contains both replicated and unreplicated locks.

But it shouldn't? ISTM that it should only track intents and that it doesn't even have to know that an intent comes with a lock built-in. I might be missing something. But I definitely don't see a reason it needs to keep track of unreplicated locks.

@tbg tbg self-requested a review March 5, 2020 10:30
},
},
{
// Ditto in reverse.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test the interaction of scans that a) have a limit (=acquire only locks for keys returned) and b) return tombstones (do we put locks on the tombstone keys and is this intentional? Add commentary on the flag that turns tombstone retrieval on).

@@ -256,7 +352,8 @@ func TestEvaluateBatch(t *testing.T) {
idKey: storagebase.CmdIDKey("testing"),
eng: eng,
}
d.ba.Header.Timestamp = hlc.Timestamp{WallTime: 1}
d.AbortSpan = abortspan.New(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What hit this?

// and all writes, in-flight or not, at the end of prior epochs. All of the
// transaction's in-flight writes are morally in this set as well, but they
// are not stored here to avoid duplication.
// The transaction's lock footprint contains spans where locks have been
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed earlier that the pipeliner keeps track of the locks solely for the purpose of handing them to EndTransaction in the end. That makes sense, though it would be nice to clarify where the locks matter (include in txn record) and where the writes do (pipeline).

// After updating the write sets, the write footprint is condensed to ensure
// that it remains under its memory limit.
// After updating the write sets, the lock footprint is condensed to ensure that
// it remains under its memory limit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, now I understand. The footprint only has spans, it doesn't care whether a lock is replicated or not. It's just for cleanup.

# scanning the primary column family.
# scanning the primary column family. A critical reason why this works is
# because column b is NOT NULL, so the UPDATE statement does not acquire
# FOR UPDATE locks on the primary column family.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we acquired ranged locks on reads, would we just get a lock on b even if it's NULL, or would we still want to lock the PK?

@tbg
Copy link
Member

tbg commented Mar 5, 2020

In the release note, could you be a little more precise on when we implicitly apply FOR UPDATE (only when all filters are pushed down, also for UPSERT, etc)? Also, that we only support exclusive locks (auto-upgrade to that). Also, a way to use EXPLAIN to verify whether locks are implicitly acquired.

Very happy to see this all go in.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 35 of 35 files at r13.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Mar 5, 2020
This change marks all columns in the YCSB "usertable" as NOT NULL. Doing
so allows the load generator to take advantage of cockroachdb#44239, which avoids
the KV lookup on the primary column family entirely when querying one of
the other column families. The query plans before and after demonstrate
this:

```
--- Before
root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key';
    tree    |    field    |               description
------------+-------------+------------------------------------------
            | distributed | false
            | vectorized  | false
  render    |             |
   └── scan |             |
            | table       | usertable@primary
            | spans       | /"key"/0-/"key"/1 /"key"/6/1-/"key"/6/2
            | parallel    |

--- After
root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key';
    tree    |    field    |      description
------------+-------------+------------------------
            | distributed | false
            | vectorized  | false
  render    |             |
   └── scan |             |
            | table       | usertable@primary
            | spans       | /"key"/6/1-/"key"/6/2
```

This becomes very important when running YCSB with a column family per
field and with implicit SELECT FOR UPDATE (see cockroachdb#45159). Now that (as
of cockroachdb#45701) UPDATE statements acquire upgrade locks during their initial row
fetch, we don't want them acquiring upgrade locks on the primary column
family of the row they are intending to update a single column in. This
re-introduces the contention between writes to different columns in the
same row that column families helped avoid (see cockroachdb#32704). By marking each
column as NOT NULL, we can continue to avoid this contention.
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

In the release note, could you be a little more precise on when we implicitly apply FOR UPDATE (only when all filters are pushed down, also for UPSERT, etc)? Also, that we only support exclusive locks (auto-upgrade to that). Also, a way to use EXPLAIN to verify whether locks are implicitly acquired.

Done, except for the point about exclusive locks, that seems like an implementation detail that will confuse users.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)


pkg/internal/client/batch.go, line 557 at r6 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

double space

Done.


pkg/internal/client/db.go, line 419 at r6 (raw file):

at least define forUpdate = true, notForUpdate = false constants.

Eh, we don't do this for isReverse and the callers are directly below.

Also curious why this is called "forUpdate", which is a SQL concept, while this is the KV API. lock lock.Mode comes to mind.

Yeah, I see what you're getting at here. We don't support any other locking strength right now though, so I'm not too inspired to change it. "forUpdate" doesn't seem like it needs to be a strict SQL concept if there's only a single locking strength that it can imply.


pkg/kv/txn_interceptor_pipeliner.go, line 184 at r6 (raw file):

Previously, tbg (Tobias Grieger) wrote…

But it shouldn't? ISTM that it should only track intents and that it doesn't even have to know that an intent comes with a lock built-in. I might be missing something. But I definitely don't see a reason it needs to keep track of unreplicated locks.

Done.


pkg/kv/txn_interceptor_pipeliner.go, line 182 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Ah, I missed earlier that the pipeliner keeps track of the locks solely for the purpose of handing them to EndTransaction in the end. That makes sense, though it would be nice to clarify where the locks matter (include in txn record) and where the writes do (pipeline).

Done.


pkg/kv/txn_interceptor_pipeliner.go, line 443 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Ah, now I understand. The footprint only has spans, it doesn't care whether a lock is replicated or not. It's just for cleanup.

Yes, I tried to clarify this in a few places. Sorry for the confusion.


pkg/kv/kvserver/replica_evaluate_test.go, line 279 at r13 (raw file):

a) have a limit (=acquire only locks for keys returned) and

Good idea. Done for both MaxSpanRequestKeys and TargetBytes.

b) return tombstones (do we put locks on the tombstone keys and is this intentional? Add commentary on the flag that turns tombstone retrieval on).

The flags aren't really in the same place. Tombstones is on storage.MVCCScanOptions and KeyLocking is on ScanRequest. They don't really interact because cmd_scan.go never sets Tombstones=true.

The behavior we get from this is intentional though - we don't lock tombstone keys, just like Postgres doesn't lock deleted rows.


pkg/kv/kvserver/replica_evaluate_test.go, line 355 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

What hit this?

When I started running transactional batches I hit this.


pkg/sql/logictest/testdata/logic_test/fk, line 2662 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

If we acquired ranged locks on reads, would we just get a lock on b even if it's NULL, or would we still want to lock the PK?

I don't think I'm exactly understanding your question and how it relates to ranged locks, but I think I may be confusing you about this why the NOT NULL part is important. Whether we acquire locks over scan ranges or just points within that range doesn't really matter. The important part here is that NOT NULL allows us to not scan the primary column family at all. If b wasn't marked as NOT NULL, we would need to scan the primary column family during the initial row-fetch of the UPDATE because b's family alone wouldn't tell us whether the row exists or not. This would have acquired an unreplicated lock on the primary column family, which would have blocked the INSERT. The optimizer knows that it only needs to update some columns and not others, but it would have been close to impossible to plumb this information down to the row fetcher and only set the KeyLocking mode on some of the scans. That also causes issues with span coalescing, because now we need to worry about two adjacent column families which want different locking strengths.

So the key here is that NOT NULL allows us to not scan the primary index at all when performing the UPDATE statements implicit FOR UPDATE scan. This was enabled by #44239 and is already being exploited in #45755 with YCSB.

I added more to the comment to make this clearer.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Don't forget to update the PR message with the updated release note before merge (not sure it matters, but I think they will both get picked up and so they better be the same).

Reviewed 59 of 59 files at r14, 14 of 14 files at r15, 7 of 7 files at r16, 34 of 34 files at r17.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)

This commit renames the IntentSpans field in Transaction, TransactionRecord
and EndTxnRequest from "IntentSpans" to "LockSpans". The field is intended
to carry the spans of all locks, not just those of write intents.
Purely a rename along with a bit of comment cleanup.
This ensures that the calls to set the system config trigger are made
before any part of the plan begins executing. The current approach would
have been an issue after the next commit because TableReader plan nodes
can now acquire locks, which means that they can cause a Txn to set its
key before the mutation plan has its `startExec` method called. This
change prevented the call to `maybeSetSystemConfig` from working
correctly on statements like:
 `UPDATE system.zones SET config = $2 WHERE id = $1`

With this change, things work correctly again.
Closes cockroachdb#40205.
Informs cockroachdb#41720.

This change teaches the KV client and the KV API about unreplicated locks.
It then adds a KeyLocking mode to ScanRequest and ReverseScanRequest, which
allows their users to select the locking strength that they would like the
scan to use. This locking strength defaults to None, which corresponds to
the current behavior. However, some users will want to acquire locks on each
row scanned, which is now possible by setting the locking strength to a
stronger level. For now, only the Exclusive strength is supported.

The change then revisits SQL's row-level locking support, which is supported
all the way down to the row fetcher for implicit (e.g. UPDATE) and explicit
(e.g. SELECT ... FOR UPDATE) upgrade locking. The change uses the new
key-locking functionality in the KV API to hook up row-level locking,
completing the integration of SELECT FOR UPDATE with the KV layer and,
in particular, the new lock-table structure.

cockroachdb#43775 described the three main
benefits of this change:
- higher throughput under contention
- lower latency and improved fairness under contention
- a reduction in transaction retries under contention

I've revisited those results a few times in the last two months and seen that
the results continue to hold, and in some cases they have improved. I intend
to update this PR with a more complete analysis of its impact on those three
areas.

Release note (sql change): SELECT FOR UPDATE now hooks into a new
leaseholder-only locking mechanism. This allows the feature to be used
to improve performance of transactions that read, modify, and write
contended to rows. Similarly, UPDATE statements now use this new
mechanism by default, meaning that their performance under contention is
improved. This is only enabled for UPDATE statements that can push their
filter all the way into their key-value scan. To determine whether an
UPDATE statement is implicitly using SELECT FOR UPDATE locking, look
for a "locking strength" field in the EXPLAIN output for the statement.
@nvanbenschoten
Copy link
Member Author

Don't forget to update the PR message with the updated release note before merge (not sure it matters, but I think they will both get picked up and so they better be the same).

Done.

@nvanbenschoten nvanbenschoten merged commit 43f498e into cockroachdb:master Mar 6, 2020
RichardJCai pushed a commit to RichardJCai/cockroach that referenced this pull request Mar 9, 2020
This change marks all columns in the YCSB "usertable" as NOT NULL. Doing
so allows the load generator to take advantage of cockroachdb#44239, which avoids
the KV lookup on the primary column family entirely when querying one of
the other column families. The query plans before and after demonstrate
this:

```
--- Before
root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key';
    tree    |    field    |               description
------------+-------------+------------------------------------------
            | distributed | false
            | vectorized  | false
  render    |             |
   └── scan |             |
            | table       | usertable@primary
            | spans       | /"key"/0-/"key"/1 /"key"/6/1-/"key"/6/2
            | parallel    |

--- After
root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key';
    tree    |    field    |      description
------------+-------------+------------------------
            | distributed | false
            | vectorized  | false
  render    |             |
   └── scan |             |
            | table       | usertable@primary
            | spans       | /"key"/6/1-/"key"/6/2
```

This becomes very important when running YCSB with a column family per
field and with implicit SELECT FOR UPDATE (see cockroachdb#45159). Now that (as
of cockroachdb#45701) UPDATE statements acquire upgrade locks during their initial row
fetch, we don't want them acquiring upgrade locks on the primary column
family of the row they are intending to update a single column in. This
re-introduces the contention between writes to different columns in the
same row that column families helped avoid (see cockroachdb#32704). By marking each
column as NOT NULL, we can continue to avoid this contention.
RichardJCai pushed a commit to RichardJCai/cockroach that referenced this pull request Mar 9, 2020
This change marks all columns in the YCSB "usertable" as NOT NULL. Doing
so allows the load generator to take advantage of cockroachdb#44239, which avoids
the KV lookup on the primary column family entirely when querying one of
the other column families. The query plans before and after demonstrate
this:

```
--- Before
root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key';
    tree    |    field    |               description
------------+-------------+------------------------------------------
            | distributed | false
            | vectorized  | false
  render    |             |
   └── scan |             |
            | table       | usertable@primary
            | spans       | /"key"/0-/"key"/1 /"key"/6/1-/"key"/6/2
            | parallel    |

--- After
root@:26257/ycsb> EXPLAIN SELECT field5 FROM usertable WHERE ycsb_key = 'key';
    tree    |    field    |      description
------------+-------------+------------------------
            | distributed | false
            | vectorized  | false
  render    |             |
   └── scan |             |
            | table       | usertable@primary
            | spans       | /"key"/6/1-/"key"/6/2
```

This becomes very important when running YCSB with a column family per
field and with implicit SELECT FOR UPDATE (see cockroachdb#45159). Now that (as
of cockroachdb#45701) UPDATE statements acquire upgrade locks during their initial row
fetch, we don't want them acquiring upgrade locks on the primary column
family of the row they are intending to update a single column in. This
re-introduces the contention between writes to different columns in the
same row that column families helped avoid (see cockroachdb#32704). By marking each
column as NOT NULL, we can continue to avoid this contention.
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/sfuAPI2 branch March 16, 2020 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: add non-trivial implementations of FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR NO KEY SHARE
5 participants