Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvprober: implement "shadow write" probes #67112

Closed
joshimhoff opened this issue Jul 1, 2021 · 19 comments
Closed

kvprober: implement "shadow write" probes #67112

joshimhoff opened this issue Jul 1, 2021 · 19 comments
Assignees
Labels
A-kv-observability C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-sre For issues SRE opened or otherwise cares about tracking. T-kv KV Team

Comments

@joshimhoff
Copy link
Collaborator

joshimhoff commented Jul 1, 2021

Is your feature request related to a problem? Please describe.
We have a kvprober that sends point read requests to "random" ranges. We should extent that prober to test the availability of a range at a write level. We can call this a "shadow write".

Describe the solution you'd like
Strawman proposal:

  1. Implement a raft command called Probe / ShadowWrite and make available via the kvclient public API.
  2. The MVP implementation of the command does nothing.
  3. Extend kvprober to make Probe / ShadowWrite requests to "random" ranges.

The test of kv is decent. The Probe / ShadowWrite command needs to get proposed, agreed upon, applied, etc. (Am I using these words, correctly?) A write to the raft log will happen, so availability of the disk is checked.

The test of pebble is minimal, as no actual write happens at Probe command apply time. Note though that we could change this in future CRDB versions. One can imagine writing to pebble but in a way that doesn't lead to user-visible side effects, in order to improve the realistic of the probe (in order to match the actual CRDB write codepath more closely).

CC @tbg @andreimatei @knz @bdarnell @jreut @logston for review of the strawman proposal. I hope for a naming bikeshed.

Also, KV folks: How hard of a time do you think I will have implementing this? It's hard for me to scope the add Probe / ShadowWrite command part. My sense from talking with Ben a while back is that it's not technically hard really but lots of boilerplate and also a new command hasn't been added in a while so may be tricky to figure out all the places to make changes.

Describe alternatives you've considered

  • We should also implement the stuck applied index + failing probe alert, which has faster mean time to detect, so long as the symptom experienced is a stuck applied index. Can't a link to an issue for that but it has been discussed.
  • We should consider other similar approaches to above, where some internal detail that is suspect (a stuck applied index) leads us to probe a specific range, leading to faster mean time to detect.

These aren't really alternatives tho. Blackbox approaches like this one are complimented by whitebox approaches.

Additional context
#61074
https://github.com/cockroachdb/cockroach/blob/master/pkg/kv/kvprober/kvprober.go

Epic CC-4054

@joshimhoff joshimhoff added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-observability T-kv KV Team labels Jul 1, 2021
@joshimhoff joshimhoff self-assigned this Jul 1, 2021
@joshimhoff joshimhoff added the O-sre For issues SRE opened or otherwise cares about tracking. label Jul 1, 2021
@tbg
Copy link
Member

tbg commented Jul 1, 2021

Can't a link to an issue for that but it has been discussed.

Here you go:

#61118
#33007

Also, KV folks: How hard of a time do you think I will have implementing this? It's hard for me to scope the add Probe / ShadowWrite command part. My sense from talking with Ben a while back is that it's not technically hard really but lots of boilerplate and also a new command hasn't been added in a while so may be tricky to figure out all the places to make changes.

Just some boilerplate, but not much in terms of technical challenges. I think we don't even need to add a new command though, it feels as though adding a new key like

cockroach/pkg/keys/keys.go

Lines 398 to 402 in a7472e3

// RangeDescriptorKey returns a range-local key for the descriptor
// for the range with specified key.
func RangeDescriptorKey(key roachpb.RKey) roachpb.Key {
return MakeRangeKey(key, LocalRangeDescriptorSuffix, nil)
}

(with suffix "prbe") would ~get the job done. It could then also be used as the key read by the read prober. You can write that key through the existing KV API, so very little plumbing needed.

@joshimhoff
Copy link
Collaborator Author

joshimhoff commented Jul 1, 2021

That would be awesome. In addition to being easier to implement, then we actually write to pebble also.

Re: "range-local", I am reading:

// 3. (replicated) Range-ID local keys vs. Range local keys

// 3. (replicated) Range-ID local keys vs. Range local keys
//
// Deciding between replicated range-ID local keys and range local keys is not
// entirely straightforward, as the two key types serve similar purposes.
// Range-ID keys, as the name suggests, use the range-ID in the key. Range local
// keys instead use a key within the range bounds. Range-ID keys are not
// addressable whereas range-local keys are. Note that only addressable keys can
// be the target of KV operations, unaddressable keys can only be written as a
// side-effect of other KV operations. This can often makes the choice between
// the two clear (range descriptor keys needing to be addressable, and therefore
// being a range local key is one example of this). Not being addressable also
// implies not having multiple versions, and therefore never having intents.
//
// The "behavioral" difference between range local keys and range-id local keys
// is that range local keys split and merge along range boundaries while
// range-id local keys don't. We want to move as little data as possible during
// splits and merges (in fact, we don't re-write any data during splits), and
// that generally determines which data sits where. If we want the split point
// of a range to dictate where certain keys end up, then they're likely meant to
// be range local keys. If not, they're meant to be range-ID local keys. Any key
// we need to re-write during splits/merges will needs to go through Raft. We
// have limits set on the size of Raft proposals so we generally don’t want to
// be re-writing lots of data. Range lock keys (see below) are separate from
// range local keys, but behave similarly in that they split and merge along
// range boundaries.
//
// This naturally leads to range-id local keys being used to store metadata
// about a specific Range and range local keys being used to store metadata
// about specific "global" keys. Let us consider transaction record keys for
// example (ignoring for a second we also need them to be addressable). Hot
// ranges could potentially have lots of transaction keys. Keys destined for the
// RHS of the split need to be collocated with the RHS range. By categorizing
// them as as range local keys, we avoid needing to re-write them during splits
// as they automatically sort into the new range boundaries. If they were
// range-ID local keys, we'd have to update each transaction key with the new
// range ID..

Gonna need to read that a few times more...

How do you guarantee the range local key doesn't conflict with some other key, e.g. a key where we store SQL data?

@joshimhoff
Copy link
Collaborator Author

joshimhoff commented Jul 1, 2021

I found:

// 2. Key Addressing

Which seems to answer my question. Reading.

@tbg
Copy link
Member

tbg commented Jul 5, 2021

How do you guarantee the range local key doesn't conflict with some other key, e.g. a key where we store SQL data?

This is basically a key that is guaranteed to not have any function yet. It's in a "parallel plane" relative to the SQL data or anything else the tenant might write. (I think it isn't even allowed to).

@joshimhoff
Copy link
Collaborator Author

Ack. Thanks. Will take a closer look soon. This seems like a great direction.

@joshimhoff
Copy link
Collaborator Author

OK, so pattern matching a bit...

I add a new prefix here, maybe "prb" or "probe":

QueueLastProcessedKey, // "qlpt"

Then I add a function like this, maybe ProberKey:

// RangeDescriptorKey returns a range-local key for the descriptor

Then I call it from kvprober, passing in the start key of the range, which the Planner returns (https://github.com/cockroachdb/cockroach/blob/master/pkg/kv/kvprober/planner.go#L27), similar to here:

key := keys.RangeDescriptorKey(desc.StartKey)

And I pass the returned key into methods like Get and Put on *DB, e.g. here in the existing kvprober code: https://github.com/cockroachdb/cockroach/blob/master/pkg/kv/kvprober/kvprober.go#L205

Do I have that right?

Now things get very fuzzy for me... gonna ask Qs that may be obvious to help me learn...

I presume kvclient handles calling Addr somewhere, so it gets routed to correct range (need to "strip the range local prefix")?

func Addr(k roachpb.Key) (roachpb.RKey, error) {

I also presume that at a storage level, the key is written with the range local prefix as per below?

~/cockroach-data$ cockroach debug keys . | grep '/Local/Range/'
0,0 /Local/Range/Min/QueueLastProcessed/"consistencyChecker": 
1622840016.688671000,0 /Local/Range/Min/RangeDescriptor: 
1622840015.379174000,0 /Local/Range/Min/RangeDescriptor: 
1622840015.091356000,0 /Local/Range/Min/RangeDescriptor: 
1622840014.578085000,0 /Local/Range/Min/RangeDescriptor: 
1622839923.579545000,0 /Local/Range/Min/RangeDescriptor: 
0,0 /Local/Range/System/NodeLiveness/QueueLastProcessed/"consistencyChecker": 
1622840004.695575000,0 /Local/Range/System/NodeLiveness/RangeDescriptor: 
1622840003.978210000,0 /Local/Range/System/NodeLiveness/RangeDescriptor: 
1622840003.757172000,0 /Local/Range/System/NodeLiveness/RangeDescriptor: 
1622840002.862404000,0 /Local/Range/System/NodeLiveness/RangeDescriptor: 
1622839923.579545000,0 /Local/Range/System/NodeLiveness/RangeDescriptor: 

I am confused how range merges & splits work. A change in range start & end key means a change in the key at which a range descriptor is stored, since part of the key is the range start key, right? Do we move these KV pairs at range merge & split time somehow? Would we need to do that with the prober KV pairs, or would kvprober simply write to whatever is the current key (as of planning time (more below)) and let GC clean up the old KV pairs? That is, no need to do anything special: Just read & write the range local keys from kvprober.

Let's say there are two ranges:

  1. A starts at key "a" and ends at key "c".
  2. B starts at key "c" and ends at key "e".

These ranges get merged into a range A that starts at key "a" and ends at key "e".

Planning ran before the merge happened returning range B with start key "c". So we try probing ProberKey("c"), even tho "c" is no longer a start key since the ranges have been merged. Is this okay? The address of ProberKey("c") implies querying the merged range A, so no problem, right?

I guess it is also possible that the read side of the prober would read a key before it has been written by the write side of the prober, but reading an empty key is not an error, so that is okay.

@tbg
Copy link
Member

tbg commented Jul 14, 2021

👋🏽

I add a new prefix here, maybe "prb" or "probe":

four chars please, which is why I suggested prbe

Then I add a function like this, maybe ProberKey:

RangeProbeKey sounds good to me.

I presume kvclient handles calling Addr somewhere, so it gets routed to correct range (need to "strip the range local prefix")?

Yes, this is all done transparently. For example here descKey is a RangeDescriptorKey of someone:

func updateRangeDescriptor(
ctx context.Context,
b *kv.Batch,
descKey roachpb.Key,
oldValue []byte,
newDesc *roachpb.RangeDescriptor,
) error {
// This is subtle: []byte(nil) != interface{}(nil). A []byte(nil) refers to
// an empty value. An interface{}(nil) refers to a non-existent value. So
// we're careful to construct interface{}(nil)s when newDesc/oldDesc are nil.
var newValue interface{}
if newDesc != nil {
if err := newDesc.Validate(); err != nil {
return errors.Wrapf(err, "validating new descriptor %+v (old descriptor is %+v)",
newDesc, oldValue)
}
newBytes, err := protoutil.Marshal(newDesc)
if err != nil {
return err
}
newValue = newBytes
}
b.CPut(descKey, newValue, oldValue)
return nil
}

I am confused how range merges & splits work.

The start key of a range is immutable while the range exists (splits never change the start key, and merges extend the end key, i.e. the right hand side stops existing). So it's possible that you would probe r1, then r1 subsumes r2, but you still probe r2 (which then writes an extra key in the middle of r1 that wouldn't be anyone's responsibility to clean up). This is mildly annoying, but something you can do here is to not actually write anything! There are two options:

  1. begin; put(key); read(key); abort
  2. begin; put(key); del(key); commit

I think I prefer the second option. The first one has this mysterious read (which you need to block the txn on replicating the previous write; otherwise it'll sneakily not wait for the write to complete before you abort the txn, being none the wiser that it didn't go through). The second one avoids that bit of complexity and actually commits a txn, which is nice. I'd have to check whether it actually leaves a tombstone at the MVCC layer but it doesn't really matter, it definitely blocks on replication.

@joshimhoff
Copy link
Collaborator Author

Hi, Tobias!

This is mildly annoying, but something you can do here is to not actually write anything! There are two options:

  1. begin; put(key); read(key); abort
  2. begin; put(key); del(key); commit

Very fun. Yay to transactions.

So it's possible that you would probe r1, then r1 subsumes r2, but you still probe r2 (which then writes an extra key in the middle of r1 that wouldn't be anyone's responsibility to clean up).

Ack yes I think this is what I was thinking about. Would GC clean up this key eventually or does GC not operate on these range local key pairs? Not saying that is a good idea to rely on GC; more wondering if GC does operate on this part of the keyspace.

I'd have to check whether it actually leaves a tombstone at the MVCC layer but it doesn't really matter, it definitely blocks on replication.

Would be nice if a write was made to storage IMO (that is, would be nice if a tombstone was left). A bit more coverage that way? Or maybe given details not so important?

@tbg
Copy link
Member

tbg commented Jul 14, 2021

GC never cleans up live values, this value would be live so GC would never touch it (it is otherwise a regular value to GC, i.e. no special casing). It would be around "forever". So, good to avoid this scenario via the txn option.

Would be nice if a write was made to storage IMO (that is, would be nice if a tombstone was left). A bit more coverage that way? Or maybe given details not so important?

Not so important I think. What matters is a full trip through the replication layer (leaving an intent, which also does all of the writing to storage that exists), reading that back, and making it through committing the txn record. Option 2) achieves all of the above.

@tbg
Copy link
Member

tbg commented Jul 14, 2021

Oh and btw, this is also a full read probe already, i.e. unless you somehow want to read probe much more aggressively than write probe, I think this could easily be the only kind of probe and that would be totally ok.

@joshimhoff
Copy link
Collaborator Author

Nice! All makes sense. This is awesome. Thanks for the help as always.

@joshimhoff
Copy link
Collaborator Author

Oh and btw, this is also a full read probe already, i.e. unless you somehow want to read probe much more aggressively than write probe, I think this could easily be the only kind of probe and that would be totally ok.

I'm thinking more about this. My gut says it's nice to have separate measurements of read & write availability, but I am not entirely figuring out why I feel that way.

Perhaps part of it is that two probe types may provide useful data during an incident. If read is green but write is red, that means a different set of potential prod issues then if read & write are red.

Similarly, when writing postmortems, it may be nice to be able to speak about both read & write availability.

Perhaps the possibility of sending a higher rate of reads than writes per second bit is a piece of the puzzle also.

Lastly, I think we also want a scan at historical timestamp prober, as that will provide a good production vet of pebble especially. So the kvprober code already needs to be flexible enough to allow for multiple probe types.

Any thoughts, Tobias?

@tbg
Copy link
Member

tbg commented Jul 20, 2021

These are reasonable points, except this one:

Lastly, I think we also want a scan at historical timestamp prober, as that will provide a good production vet of pebble especially. So the kvprober code already needs to be flexible enough to allow for multiple probe types.

This will exercise follower reads, but has no particular coverage of pebble. To probe pebble a) you want to know what node you're talking to, which kvprober doesn't - it operates above the abstraction of "range") and b) you want to do something better than ask pebble for that one key! You'd pick a random SST from the manifest and seek somewhere into it, or something like that.

Also, re: "there will be multiple things we want to probe for so kvprober should bake in reads and writes", wouldn't it be the opposite? We don't want an ever-proliferating set of cluster settings and knobs, do we? Or at least we wouldn't want to commit to that too early.

@joshimhoff
Copy link
Collaborator Author

joshimhoff commented Jul 20, 2021

Will open a separate issue with both you & storage on it, but I'm thinking the scan prober would scan the whole range at a historical timestamp and export metrics re: the bandwidth that live data (at the historical timestamp) was scanned at. I'd expect a big enough dip in that bandwidth indicates a production issue, and the root cause would often be pebble (e.g. an "inverted" LSM) or else infra issues (e.g. issue with disk leading to lower perf than expected). Does that seem wrong?

I hear you saying we can get a better production vet of pebble by probing based on knowledge of internal details of pebble ("pick SST" (a concept not exposed by either the kvclient API or the storage API)). I think this may be true but perhaps also the place to start is a higher level signal of cluster health, so some signal based on the kvclient API, like the one proposed above.

Agree? Disagree? Half disagree / half agree?

Also, re: "there will be multiple things we want to probe for so kvprober should bake in reads and writes", wouldn't it be the opposite? We don't want an ever-proliferating set of cluster settings and knobs, do we? Or at least we wouldn't want to commit to that too early.

Ya, you're right. My thinking got a little twisted there. We only want the complexity of multiple probe types when it's worth the costs.

@tbg
Copy link
Member

tbg commented Jul 20, 2021 via email

@joshimhoff
Copy link
Collaborator Author

joshimhoff commented Jul 20, 2021

Ack. Will digest this comment a bit, think more in shower :), & start more discussion in another place. Thanks, Tobias.

@joshimhoff
Copy link
Collaborator Author

I am working on this now.

@joshimhoff
Copy link
Collaborator Author

Just put up a PR that adds a range-local key dedicated to blackbox probing: #68645.

I have a hacked POC of the "shadow write" functionality, and it appears to be working:

+++ b/pkg/kv/kvprober/kvprober.go
@@ -168,7 +168,7 @@ type dbGet interface {
 }
 
 // Doesn't return an error. Instead increments error type specific metrics.
-func (p *Prober) probe(ctx context.Context, db dbGet) {
+func (p *Prober) probe(ctx context.Context, db *kv.DB) {
        defer logcrash.RecoverAndReportNonfatalPanic(ctx, &p.settings.SV)
 
        if !readEnabled.Get(&p.settings.SV) {
@@ -203,8 +203,16 @@ func (p *Prober) probe(ctx context.Context, db dbGet) {
                // but that is okay. Even if there is no data at the key, the prober still
                // executes a basic read operation on the range.
                // TODO(josh): Trace the probes.
-               _, err = db.Get(ctx, keys.RangeProbeKey(step.StartKey))
-               return err
+               return db.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error {
+                       k := keys.RangeProbeKey(step.StartKey)
+                       if err := txn.Put(ctx, k, "blah"); err != nil {
+                               return err
+                       }
+                       if err := txn.Del(ctx, k); err != nil {
+                               return err
+                       }
+                       return nil
+               })
        })
        if err != nil {
                // TODO(josh): Write structured events with log.Structured.
diff --git a/pkg/kv/kvprober/kvprober_integration_test.go b/pkg/kv/kvprober/kvprober_integration_test.go
index 3f52c66899..1cc4c6ede9 100644
--- a/pkg/kv/kvprober/kvprober_integration_test.go
+++ b/pkg/kv/kvprober/kvprober_integration_test.go
@@ -122,7 +122,7 @@ func TestProberDoesReads(t *testing.T) {
                                                for _, ru := range ba.Requests {
                                                        // Planning depends on Scan so only returning an error on Get
                                                        // keeps planning working.
-                                                       if ru.GetGet() != nil {
+                                                       if ru.GetPut() != nil {
                                                                return roachpb.NewError(fmt.Errorf("boom"))
                                                        }
                                                }

I will turn this into a proper PR now. I will leave read probes intact as per the above discussion about value of having both.

craig bot pushed a commit that referenced this issue Aug 19, 2021
66893: cli,storage: add emergency ballast  r=jbowens a=jbowens

Add an automatically created, on-by-default emergency ballast file. This
new ballast defaults to the minimum of 1% total disk capacity or 1GiB.
The size of the ballast may be configured via the `--store` flag with a
`ballast-size` field, accepting the same value formats as the `size`
field.

The ballast is automatically created when either available disk space is
at least four times the ballast size, or when available disk space after
creating the ballast is at least 10 GiB. Creation of the ballast happens
either when the engine is opened or during the periodic Capacity
calculations driven by the `kvserver.Store`.

During node start, if available disk space is less than or equal to half
the ballast size, exit immediately with a new Disk Full (10) exit code.

See #66493.

Release note (ops change): Add an automatically created, on by default
emergency ballast file. This new ballast defaults to the minimum of 1%
total disk capacity or 1GiB.  The size of the ballast may be configured
via the `--store` flag with a `ballast-size` field, accepting the same
value formats as the `size` field. Also, add a new Disk Full (10) exit
code that indicates that the node exited because disk space on at least
one store is exhausted. On node start, if any store has less than half
the ballast's size bytes available, the node immediately exits with the
Disk Full (10) exit code. The operator may manually remove the
configured ballast (assuming they haven't already) to allow the node to
start, and they can take action to remedy the disk space exhaustion. The
ballast will automatically be recreated when available disk space is 4x
the ballast size, or at least 10 GiB is available after the ballast is
created.

68645: keys/kvprober: introduce a range-local key for probing, use from kvprober r=tbg a=joshimhoff

This work sets the stage for extending `kvprober` to do writes as is discussed in detail with @tbg at #67112.

**keys: add a range-local key for probing**

This commit introduces a range-local key for probing. The key will
only be used by probing components like kvprober. This means no
contention with user-traffic or other CRDB components. This key also provides
a safe place to write to in order to test write availabilty. A kvprober that
does writes is coming soon.

Release note: None.

**kvprober: probe the range-local key dedicated to probing**

Before this commit, kvprober probed the start key of a range. This worked okay,
as kvprober only did reads, and contention issues leading to false positive
pages haven't happened in practice. But contention issues are possible,
as there may be data located at the start key of the range.

With this commit, kvprober probes the range-local key dedicated to
probing. No contention issues are possible, as that key is only for
probing. This key is also needed for write probes, which are coming soon.

Release note: None.

69164: Revert "backupccl: protect entire keyspan during cluster backup" r=dt a=adityamaru

This reverts commit 1b5fd4f.

The commit above laid a pts record over the entire table keyspace.
This did not account for two things (with the potential of there being
more):

1. System tables that we do not backup could have a short GC TTL, and
so incremental backups that attempt to protect from `StartTime` of the
previous backup would fail.

2. Dropped tables often have a short GC TTL to clear data once they have
been dropped. This change would also attempt to protect "dropped but not
gc'ed tables" even though we exclude them from the backup, and fail on
pts verification.

One suggested approach is to exclude all objects we do not backup by
subtracting these spans from {TableDataMin, TableDataMax}. This works
for system tables, and dropped but not gc'ed tables, but breaks for
dropped and gc'ed tables. A pts verification would still find the leaseholder
of the empty span and attempt to protect below the gc threshold.

In conclusion, we need to think about the semantics a little more before
we rush to protect a single key span.

Co-authored-by: Jackson Owens <[email protected]>
Co-authored-by: Josh Imhoff <[email protected]>
Co-authored-by: Aditya Maru <[email protected]>
@joshimhoff
Copy link
Collaborator Author

#69035

craig bot pushed a commit that referenced this issue Aug 25, 2021
69035: kvprober: extend kvprober to test KV's write codepaths r=joshimhoff a=joshimhoff

Design discussed in detail at #67112.

**kvprober: extend kvprober to test KV's write codepaths**

Before this commit, kvprober did point reads of range-local keys dedicated to
probing. kvprober did not exercise KV's write codepaths.

With this commit, kvprober exercises KV's write codepaths. kvprober has two
probe loops, one that does point reads & another that commits a txn that puts
and deletes a key, leaving no live data but exercising the write codepaths.

Release justification: Improvement to observability used only by CC SREs.

Release note: None.



69041: sqlproxyccl: use tokenbucket to throttle connection attempts r=JeffSwenson a=JeffSwenson

Previously, the sqlproxy used exponential backoff to reject connection
attempts if the source ip previously made an invalid connection attempt.
A map was used to track previously succesful (source ip, destination
cluster) pairs. The throttle was only applied to unsuccesful pairs.

Now a token bucket is used to throttle connections. If a connection is
succesful, the token is returned to the bucket. This should avoid
throttling well behaved users and provide tools to throttle
abuse.

The ConnectionCache was moved into the throttle service in order to
group the throttling logic into a single module.

The general intent of the change is to allow more connection attempts
before throttling kicks in. The main issue with the old approach is:
1. Testers would regularly lock themselves out by configuring an
   incorrect password. The max backoff was a too aggressive and
   testers would get stuck waiting minutes to hours for the throttle
   to end.
2. It's unclear how to handle racing connection attempts. Currently all
   connection attempts before the first success trigger the throttling
   code path. Using a token bucket naturally allows us to bound the
   total number of attempts the system lets through.

Release note: None

69043: sql: implement `GENERATED ... AS IDENTITY` functionality for `INSERT/UPDATE/UPSERT` r=ZhouXing19 a=ZhouXing19

This commit is to implement the functionality of
GENERATED {ALWAYS | BY DEFAULT} AS IDENTITY
for INSERT/UPDATE/UPSERT statement.

In PostgreSQL, if a column is created with
GENERATED ALWAYS AS IDENTITY token,

  1. it can only be updated to DEFAULT;
  2. when executing INSERT on this column, the conflict
  cannot be resolved by `ON CONFLICT` statement;
  3. it cannot be written explicitly; (i.e. cannot INSERT/UPSERT
  without the OVERRIDING SYSTEM VALUE token);
  4. it is implicitly NOT NULL, but we must INSERT/UPSERT
  without specifying the value for it.

If a column is created with
`GENERATED BY DEFAULT AS IDENTITY` token,
the above restrictions do not apply to it.

Tests for optbuilder support is added in
pkg/sql/opt/optbuilder/testdata.

This commit also adds a type restriction for
GENERATED {ALWAYS | BY DEFAULT} AS IDENTITY
syntax under CREATE TABLE -- this column can only be of smallint,
integer, or bigint type.

This matches the PostgreSQL syntax:
https://www.postgresql.org/docs/current/sql-createtable.html

Release note: None

Release justification: although this supports a new feature,
the changes are high-benefit and have been under scrutiny for many weeks now,
so after discussing with other leads we feel fine about merging this during
stability period.

69296: build,ccl: quash a few outdated references to `libroach` r=rail a=rickystewart

`libroach` doesn't exist any more.

Release justification: Non-production code change
Release note: None

69313: workload: reintroduce old `prepare` behavior r=ajwerner a=rafiss

This reverts commit f446498,
and makes the 'prepare' setting work more closely to how it used to.

This is needed because we are seeing that the schemachange workload is
deallocating huge amounts of automatically prepared statements. By
explicitly preparing statements again, they should no longer be
deallocated.

This also disables the automatic prepared statement cache in pgx entirely,
which was enabled in jackc/pgx@0c3e59b

Release justification: test-only change
Release note: None

69341: ccl/backupccl: make TestBackupRestoreTenant faster r=adityamaru a=sajjadrizvi

TestBackupRestoreTenant/restore-tenant10-to-latest is slow
due to a long adopt interval. This commit uses a testing knob
to shorten test time.

Release justification: low risk modification to reduce test time

Release note: None

Co-authored-by: Josh Imhoff <[email protected]>
Co-authored-by: Jeff <[email protected]>
Co-authored-by: Jane Xing <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Sajjad Rizvi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-observability C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-sre For issues SRE opened or otherwise cares about tracking. T-kv KV Team
Projects
None yet
Development

No branches or pull requests

2 participants