using primary and slave DBs to solve the panic problem caused by DB c… #830

yingchunliu-zte · 2024-09-10T08:13:44Z

When the system is powered off and reset, containerd experiences a panic:
panic: freepages: failed to get all reachable pages (key[0]=(hex)** on leaf page(1229) needs to be < than key of the next element in ancestor(hex)**. Pages stack: [1974,1229])

The main steps for using primary and slave DBs to solve the panic problem caused by DB conflicts are as follows:

When opening db, if successful, it will be opened as master db and copied as slave db. Otherwise, open slave db as master db and copy master db as slave db.
Open slave db as DB.slave
Add slave members to DB, Tx, and Bucket objects and reload their write methods: after the main object operation is successful, the slave object performs the operation

When the system is powered off, if writing to master db causes a master db conflict, containerd will use slave db after panic. If you are writing a slave db, containerd will create a new slave db.

kind: bug

k8s-ci-robot · 2024-09-10T08:13:48Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: yingchunliu-zte
Once this PR has been reviewed and has the lgtm label, please assign ptabor for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…onflicts Signed-off-by: yingchunliu-zte <[email protected]>

tjungblu · 2024-09-10T09:14:39Z

Thanks @yingchunliu-zte - that approach already exists (in smaller scale) internally via the meta pages:
https://github.com/etcd-io/bbolt/blob/main/db.go#L1123-L1144

You can see in the recent investigation by @ahrtr that simply rolling back the page helps in recovering:
#778 (comment)

There's another scenario where we believe some pages didn't persist properly during power-down events on virtualized filesystems. I don't think that copying an entire bucket or database file is helping here at all.

yingchunliu-zte · 2024-09-10T09:35:13Z

Thanks @tjungblu

Commit is not an atomic operation, and when the system loses power, the db disk drop may be incomplete.
Every write operation here will be written to the redundant backup database. When committing, there is always a good database for automatic recovery to avoid constant panic.

ahrtr · 2024-09-10T09:37:10Z

Yes, basically I agree with @tjungblu . Please also refer to https://github.com/ahrtr/etcd-issues/blob/master/docs/cncf_storage_tag_etcd.md#storage-boltdb-feature

But on other hand, It's totally up to applications to do whatever higher level protection (e.g master-slave) they want. But it may not be an easy task.

From bbolt perspective, there indeed are some long standing data corruption issue. One of the possible reasons could be due to filesystem as mentioned in #778 (comment). But it's also possible that there are some bugs in the freelist management, refer to #789. I am open to any thoughts on how to resolve such data corruption issues.

ahrtr · 2024-09-10T09:41:17Z

Commit is not an atomic operation

It's atomic. Please refer to the link in my previous comment.

To be clearer, we won't accept this PR but thanks anyway.

Please feel free to raise a topic in discussions if you want.

tjungblu · 2024-09-10T09:55:38Z

Commit is not an atomic operation

Every write operation here will be written to the redundant backup database.

If commit were not atomic, you now have a two-phase commit issue without an actual commit you need solve. I hope you see where this is going :)

Please feel free to raise a topic in discussions if you want.

+1, happy to brainstorm this further along

yingchunliu-zte · 2024-09-11T02:32:38Z

@tjungblu @ahrtr
The commit operation I am referring to is non atomic, which means that some pages were successfully dropped while others were not. In this case, master slave db may play a role.

k8s-ci-robot added the size/L label Sep 10, 2024

yingchunliu-zte force-pushed the panic branch 3 times, most recently from 4a5d0a5 to 5b80a56 Compare September 10, 2024 08:26

using primary and slave DBs to solve the panic problem caused by DB c…

415cfe3

…onflicts Signed-off-by: yingchunliu-zte <[email protected]>

yingchunliu-zte force-pushed the panic branch from 5b80a56 to 415cfe3 Compare September 10, 2024 08:50

ahrtr closed this Sep 10, 2024

yingchunliu-zte deleted the panic branch September 12, 2024 02:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using primary and slave DBs to solve the panic problem caused by DB c… #830

using primary and slave DBs to solve the panic problem caused by DB c… #830

yingchunliu-zte commented Sep 10, 2024

k8s-ci-robot commented Sep 10, 2024

tjungblu commented Sep 10, 2024

yingchunliu-zte commented Sep 10, 2024

ahrtr commented Sep 10, 2024

ahrtr commented Sep 10, 2024

tjungblu commented Sep 10, 2024

yingchunliu-zte commented Sep 11, 2024 •

edited

Loading

using primary and slave DBs to solve the panic problem caused by DB c… #830

using primary and slave DBs to solve the panic problem caused by DB c… #830

Conversation

yingchunliu-zte commented Sep 10, 2024

k8s-ci-robot commented Sep 10, 2024

tjungblu commented Sep 10, 2024

yingchunliu-zte commented Sep 10, 2024

ahrtr commented Sep 10, 2024

ahrtr commented Sep 10, 2024

tjungblu commented Sep 10, 2024

yingchunliu-zte commented Sep 11, 2024 • edited Loading

yingchunliu-zte commented Sep 11, 2024 •

edited

Loading