-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ccl/backupccl: TestGCDropIndexSpanExpansion failed #75202
Comments
test fails while executing the full backup, specifically when the backup processor attempts to export |
ccl/backupccl.TestGCDropIndexSpanExpansion failed with artifacts on master @ ebda0ecb4aa1fe47f1403635846e342a2cfbfa1b:
Help
See also: How To Investigate a Go Test Failure (internal)
|
I think this is my fault, I'll take this. |
ccl/backupccl.TestGCDropIndexSpanExpansion failed with artifacts on master @ 72655fd4730fadb0ddacd8a5513a5d2d9121cbc6:
Help
See also: How To Investigate a Go Test Failure (internal)
|
So turns out this is actually not related to the ongoing PTS work and bisecting it points to #73876. cc:@irfansharif I'm not exactly sure what behavior has changed so still looking into it. |
So I've narrowed down what is going on a little more. The test was slightly incorrect in that it was setting a short GC TTL on the whole table, instead of just the index that was being dropped (foo@bar). Even after fixing this (https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/backupccl/backup_test.go#L9014), sometimes the last backup seems to issue an export request whose batch timestamp falls below the range's GCThreshold. With or without protected timestamps, this should not be the case, since the default GC TTL is 25 hours. After the GC job on the dropped index successfully runs GC (https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/backupccl/backup_test.go#L9034), and we run the third backup, the logs show a few interesting things: We seem to see some range split activity:
In the third backup, when we are about to process the export request to
However, when we go to evaluate the export request to
@irfansharif does anything stand out as unexpected in the above logs as a consequence of enabling span configs? |
I get a sense that what is happening here is similar to #75436, let me try fixing it similarly. |
Do you have a quick repro/branch I can use? It could be a few things:
|
Yup, if you apply this diff to fix the test it fails under stressrace in ~200 runs on my gceworker.
fwiw I also tried running with closed timestamp interval to 100ms and adding the
I don't think a retry would fix this cause an ExportRequest sets its |
For my own understanding, is the "gc.ttlseconds = '1'" necessary for this test? |
Yup, the GC job responsible for clearing the dropped index waits until the TTL has passed before actually running GC. So 1 second just ensures that the job completes in a reasonable timeframe. |
Confirming it repros reliably:
Running the test's (modified) SQL code locally to remind myself of SQL encoded partitions and the relevant configs: CREATE DATABASE test; USE test;
CREATE TABLE foo (id INT PRIMARY KEY, id2 INT, id3 INT, INDEX bar (id2), INDEX baz(id3));
ALTER INDEX foo@bar CONFIGURE ZONE USING gc.ttlseconds = '1';
INSERT INTO foo VALUES (1, 2, 3);
DROP INDEX foo@bar; SELECT
crdb_internal.pretty_key(start_key, -1), crdb_internal.pretty_key(end_key, -1),
crdb_internal.pb_to_json('cockroach.roachpb.SpanConfig', config)->'gcPolicy'->>'ttlSeconds'
FROM system.span_configurations;
Adding some instrumentation, I'm seeing the following.
Specifically we go through the following set of range splits:
When evaluating the export request ("ExportRequest for span /Table/56/{3-4}"), we see we're actually hitting r45 before it's fully split off r46 with the "default" TTL. Specifically this sequence logs, in order:
This is basically just #71977 which has been true for CRDB for time immemorial; since we're processing the span configs one a time, splitting of ranges one at a time at the end of the key range, when initially carving out r45 to contain the index's data, it we're first splitting on the left side of the key range, creating a range that encompasses keys after the index's data ( |
ccl/backupccl.TestGCDropIndexSpanExpansion failed with artifacts on master @ 421a767fa41a67efa157a96be8b8e43cdbc55b63:
Help
See also: How To Investigate a Go Test Failure (internal)
|
75491: backupccl: deflake TestGCDropIndexSpanExpansion r=irfansharif a=adityamaru With #73876 there is a bit more asynchrony than before and thus the test must wait until all the ranges have completed splitting before it attempts the last backup, so that the ExportRequest targets the range with the correct SpanConfig applied to it. Fixes: #75202 Release note: None Co-authored-by: Aditya Maru <[email protected]>
ccl/backupccl.TestGCDropIndexSpanExpansion failed with artifacts on master @ 46bdd4e299b3324eb65b943dc4a1c10ded7d8a7a:
Help
See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:
This test on roachdash | Improve this report!
The text was updated successfully, but these errors were encountered: