Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ovn db: recover automatically on startup if db corruption is detected #1980

Merged
merged 1 commit into from
Oct 21, 2022

Conversation

zhangzujian
Copy link
Member

What type of this PR

  • Bug fixes

Which issue(s) this PR fixes:

Fixes #1968

@zhangzujian zhangzujian added bug Something isn't working need backport labels Oct 19, 2022
@zhangzujian
Copy link
Member Author

Create a 3-node OVN cluster:

root@node1:/kube-ovn# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
f987
Name: OVN_Northbound
Cluster ID: 39da (39da3e2a-2c7e-461c-9bc7-58527c22fe42)
Server ID: f987 (f9879d86-f776-4a13-a07a-626da622ef31)
Address: tcp:[172.20.0.3]:6643
Status: cluster member
Role: leader
Term: 1
Leader: self
Vote: self

Last Election started 76960 ms ago, reason: timeout
Last Election won: 76960 ms ago
Election timer: 5000
Log: [10, 110]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: <-9e28 ->9e28 <-a885 ->a885
Disconnections: 0
Servers:
    a885 (a885 at tcp:[172.20.0.2]:6643) next_index=110 match_index=109 last msg 1409 ms ago
    f987 (f987 at tcp:[172.20.0.3]:6643) (self) next_index=2 match_index=109
    9e28 (9e28 at tcp:[172.20.0.4]:6643) next_index=110 match_index=109 last msg 1409 ms ago

Stop server f987, replace th nb db file with a corrupted one, and start the server:

root@k8s:/# kubectl -n kube-system logs ovn-central-7c7b56988c-xdv86
 * ovn-northd is not running
 * ovnnb_db is not running
 * ovnsb_db is not running
ovsdb-tool: /etc/ovn/ovnnb_db.db: record 16891 with index 73770 skips past expected index 73755
detected database corruption for file /etc/ovn/ovnnb_db.db, rebuild it.
get local server id f9879d86-f776-4a13-a07a-626da622ef31
local address: tcp:[172.20.0.3]:6643
remote addresses: tcp:[172.20.0.4]:6643 tcp:[172.20.0.2]:6643
generating new database file /etc/ovn/ovnnb_db.db.init-6d0645
backup /etc/ovn/ovnnb_db.db to /etc/ovn/ovnnb_db.db.backup-531b49
use new database file /etc/ovn/ovnnb_db.db.init-6d0645

Check the cluster status:

root@node1:/kube-ovn# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
f987
Name: OVN_Northbound
Cluster ID: 39da (39da3e2a-2c7e-461c-9bc7-58527c22fe42)
Server ID: f987 (f9879d86-f776-4a13-a07a-626da622ef31)
Address: tcp:[172.20.0.3]:6643
Status: cluster member
Role: follower
Term: 4
Leader: 9e28
Vote: 9e28

Election timer: 5000
Log: [119, 119]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->0000 ->0000 <-a885 <-9e28
Disconnections: 0
Servers:
    a885 (a885 at tcp:[172.20.0.2]:6643) last msg 79796 ms ago
    f987 (f987 at tcp:[172.20.0.3]:6643) (self)
    9e28 (9e28 at tcp:[172.20.0.4]:6643) last msg 1519 ms ago

@zhangzujian zhangzujian marked this pull request as ready for review October 19, 2022 09:29
if db corruption is detected
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working need backport
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

When DB is full and recover, ovn-central failed to start
2 participants