Non-zero instance start time #3901

rnewson · 2022-01-16T16:01:16Z

Set instance_start_time to the creation time of the database to
restore the ability for the replicator to detect a db recreation event
during a replication. Without this, a replication can fail to notice
the db was deleted and recreated, write the checkpoint document for
the in-progress replication and keep going, whereas what should happen
is the replication should reset from sequence 0.

Resolves #3897

janl · 2022-01-18T10:47:12Z

lgtm overall, I just have a question about a thing that I’m not too familiar with: since you are updating a record, do we have to worry about mixed clusters running one vs the other? — I would assume not since shards are node local mostly, but there might some edge case?

rnewson · 2022-01-18T14:01:29Z

I don't think shard records are transmitted between nodes but it's a good question. @nickva thoughts?

nickva · 2022-01-18T21:46:13Z

I couldn't find any place where we pass the #shard record record to the workers. I looked in fabric and mem3_rep mainly. The typical pattern is we spawn the workers based on the local shard map in submit_jobs but don't pass the actual shard records. But it's not a 100% guarantee, I just skimmed through.

However, I had noticed that the suffix is already part of the shard name and is a tad redundant:

 mem3:shards(<<"db1">>).

[#shard{name = <<"shards/00000000-7fffffff/db1.1642540491">>,
        node = '[email protected]',dbname = <<"db1">>,
        range = [0,2147483647],
        ref = undefined,suffix = <<".1642540491">>,
        opts = [{props,[]}]},
        ...

And there is a function which returns the suffix in mem3. Perhaps we could use to get the suffix:

 mem3:shard_suffix(<<"db1">>).
".1642540491"

We call when building an etag in chttpd_db and a few places already.

nickva

Looks good. It's kind of a hack but 1) it's backward compatible with existing replicators and 2) is compatible with the already created dbs in 2.x/3x.

Ideally, we'd implement a db uuid field like we did in main, and return that in the db info / _ensure_full_commit, then make the replicator check for uuids source/target changes as described in #3897. However, we can still do that in the future and in the meantime this PR should significantly reduce [*] the chance of the target db getting re-created without replicator noticing.

A few notes:

Since we're re-using the old instance_start_time field and it's still a timestamp, we might want to remove the "." in front of the suffix. Just in case there is some type checking somewhere in a client library which asserts a numeric format here. There is already a mem3:shard_suffix/1 so could add a mem3:shard_creation_time/1.
Even though the old (<2.x) and the new instance_start_time are timestamps, the old ones where updated when the db was opened (endpoint server was restarted, for example). The unix time suffix in our case stays the same for the life of that database instance until it's re-created. We should verify that the replicator will do the right thing in light of that difference. I think it would, just leaving it here as a note for any reviewers. It would be this replicator section
We should probably update the replicator endpoint instance start time mismatch logs to suggest a better reason for the failure. The Try to increase max_dbs_open at the target's server." might not make sense any longer.

[*] We are not completely eliminating the chance as it's still possible that the db would be recreated in the small (under a second) interval between _ensure_full_commit and writing the checkpoint _local document.

nickva · 2022-01-18T23:06:43Z

Verified by re-creating the replication target. The logs look a bit messy got a few of these in a row:

[error] 2022-01-18T22:59:19.818867Z [email protected] <0.9133.0> -------- CRASH REPORT Process  (<0.9133.0>) with 1 neighbors exited with reason:
{checkpoint_commit_failure,<<"Target database out of sync. Try to increase max_dbs_open at the target's server.">>} at gen_server:decode_msg/9(l
ine:475) <= proc_lib:init_p_do_apply/3(line:226); initial_call: {couch_replicator_worker,init,['Argument__1']}, ancestors: [<0.9039.0>,couch_rep
licator_scheduler_sup,couch_replicator_sup,...], message_queue_len: 0, links: [<0.9134.0>], dictionary: [{last_stats_report,{1642,546729,944177}
}], trap_exit: true, status: running, heap_size: 1598, stack_size: 28, reductions: 8883

Otherwise it works as intended. All the docs were re-replicated to the new target instance.

Noticed a failure for get_dbs_info in the CI and locally as well for _dbs_info endpoint.

http $DB/_dbs_info
HTTP/1.1 500 Internal Server Error


{
    "error": "badmatch",
    "reason": "_dbs",
    "ref": 4194149092
}

Edit: probably something to do with this https://github.com/apache/couchdb/blob/3.x/src/mem3/src/mem3.erl#L125-L149

src/mem3/src/mem3.erl

Set instance_start_time to the creation time of the database to restore the ability for the replicator to detect a db recreation event during a replication. Without this, a replication can fail to notice the db was deleted and recreated, write the checkpoint document for the in-progress replication and keep going, whereas what should happen is the replication should reset from sequence 0.

nickva

+1 well done!

We should check and see if docs need updating as well. There was at least a mention in the replication protocol description that instance_start_time is always <<"0">>

xrobau · 2023-05-24T23:27:38Z

This has just appeared on the latest 3.3.0 build (3.3.2, actually) from when we upgraded from 3.2.2-2~jammy.

Unfortunately, this caused ALL the client replicators to abort with an error:

  "state": "crashing",
  "info": {
    "error": "instance_start_time on source database has changed since last checkpoint."
  },

As 3.2.2-2 returned zero, and 3.3.2 returns the correct time, I'm not really sure how to fix this without triggering a complete re-replication. which in a database with several hundred million entries is an annoyingly complex and slow process.

Edit: If anyone ended up here by googling the error, to temporarily work around it roll back to 3.2.2-2, using the command apt install couch=3.2.2-2~jammy and follow the bug #4621

nickva · 2023-05-24T23:46:56Z

We might want to add a special case for comparing to <<"0">> and a non-zero and let that "pass" as a valid comparison, handling cases where a cluster is upgraded...

rnewson · 2023-05-25T08:11:52Z

instance_start_time changing only causes the replication job to exit. When it's restarted it will look for replication checkpoint docs as usual. In this circumstance they will exist, so replication resumes from there.

The "normal" (i.e, couchdb 1.0) reason for instance_start_time to change is the database was closed and reopened (or, more extremely, was deleted and re-created). So it is only a mechanism to ensure replication includes everything, it's not a reset.

nickva · 2023-05-25T14:43:15Z

Agree with @rnewson and verified that it doesn't reset the replication in #4621

During cluster upgrades from 3.2 to 3.3 it does make a bit mess in the logs so I created a PR to reduce that part. Basically to accept 0 as a valid timestamp and not crash and restart.

rnewson force-pushed the nonzero_instance_start_time branch from 6b4e5e5 to 4ebecda Compare January 17, 2022 15:43

rnewson force-pushed the nonzero_instance_start_time branch from 4ebecda to 47df3ca Compare January 18, 2022 14:05

nickva reviewed Jan 18, 2022

View reviewed changes

rnewson force-pushed the nonzero_instance_start_time branch 3 times, most recently from 3882a61 to 475de35 Compare January 18, 2022 22:45

rnewson force-pushed the nonzero_instance_start_time branch 3 times, most recently from 31b36fb to caae989 Compare January 19, 2022 17:04

nickva reviewed Jan 19, 2022

View reviewed changes

src/mem3/src/mem3.erl Outdated Show resolved Hide resolved

rnewson force-pushed the nonzero_instance_start_time branch from caae989 to 39e9c83 Compare January 19, 2022 17:34

rnewson force-pushed the nonzero_instance_start_time branch from 39e9c83 to fedf37d Compare January 19, 2022 17:53

nickva approved these changes Jan 19, 2022

View reviewed changes

rnewson merged commit 3abcc42 into 3.x Jan 19, 2022

rnewson deleted the nonzero_instance_start_time branch January 19, 2022 19:04

nickva mentioned this pull request Jan 24, 2022

Replicator may not notice that the target db was re-created #3897

Closed

This was referenced May 24, 2023

instance_start_time changes in 3.3 cause problems with existing replication #4621

Closed

Documentation error: GET/ {db} request always has instance_start_time of "0" #1422

Closed

nickva mentioned this pull request Jun 12, 2023

Ensure full commit timing out on replication of large database #4639

Open

ricellis mentioned this pull request May 17, 2024

instance_start_time removed from the get database information IBM/cloudant-node-sdk#1525

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-zero instance start time #3901

Non-zero instance start time #3901

rnewson commented Jan 16, 2022 •

edited

Loading

janl commented Jan 18, 2022

rnewson commented Jan 18, 2022

nickva commented Jan 18, 2022

nickva left a comment •

edited

Loading

nickva commented Jan 18, 2022 •

edited

Loading

nickva left a comment

xrobau commented May 24, 2023 •

edited

Loading

nickva commented May 24, 2023

rnewson commented May 25, 2023

nickva commented May 25, 2023

Non-zero instance start time #3901

Non-zero instance start time #3901

Conversation

rnewson commented Jan 16, 2022 • edited Loading

janl commented Jan 18, 2022

rnewson commented Jan 18, 2022

nickva commented Jan 18, 2022

nickva left a comment • edited Loading

Choose a reason for hiding this comment

nickva commented Jan 18, 2022 • edited Loading

nickva left a comment

Choose a reason for hiding this comment

xrobau commented May 24, 2023 • edited Loading

nickva commented May 24, 2023

rnewson commented May 25, 2023

nickva commented May 25, 2023

rnewson commented Jan 16, 2022 •

edited

Loading

nickva left a comment •

edited

Loading

nickva commented Jan 18, 2022 •

edited

Loading

xrobau commented May 24, 2023 •

edited

Loading