Bug Report: `DeleteBeforeRestore` is ignored in backup engines #16811

rvrangel · 2024-09-20T09:49:41Z

Overview of the Issue

Pulling this discussion out of another PR: Vitess has a flag called DeleteBeforeRestore that is supposed to signal the backup engines if they should wipe out the current MySQL data before restoring a backup, but it is currently being ignored by all backup engines (including the new one introduce in #16295)

We decided to keep the behaviour consistent for now, but this issue is to continue the discussion if we should remove this or fix anything that might be breaking as a result of it.

Reproduction Steps

From the description in here, it seems vtop would be relying on the backup engines ignoring that value so that when the vttablet starts with the --restore-from-backup it still deletes and restores the databases despite the vtdataroot already being filled.

Binary Version

current version on (`main`)

Operating System and Environment details

not OS dependant

Log Fragments

No response

The text was updated successfully, but these errors were encountered:

rvrangel · 2024-09-20T09:52:05Z

when working on #16295, we noticed both xtrabackup and builtin ignore that flag and when we used it as expected in mysqlshell, @frouioui saw the issues with the restore failing because the data was still there when the restore started.

For us it wasn't an issue since we either restore on an empty tablet or we call the RestoreFromBackup RPC (which would cause it to be wiped), but it seems that the vtop tooling doesn't clean tablets once they are restored and relies on this "bug" to complete the restore?

deepthi · 2024-09-25T21:33:41Z

TL;DR there is no bug.

Let's look at these functions from restore.go

func (tm *TabletManager) RestoreData(...
	deleteBeforeRestore bool,...

and

func (tm *TabletManager) restoreDataLocked(..., deleteBeforeRestore bool, ...

RestoreData is called from tm_init.go. Whenever a tablet is started with --restore_from_backup, we call this with deleteBeforeRestore = false. This is because on a simple restart we would not want to delete an existing database. Instead, later on in the call stack, we'll check whether we shouldRestore, and part of that check is to check whether data dir (or database) is non-empty. If the database is empty, we restore from backup, otherwise we just continue with tablet initialization.
restoreDataLocked is called from both RestoreData and from the tablet RPC RestoreFromBackup. When a tablet is told via RPC to restore from a backup, the understanding is that we want to throw away whatever we currently have and start over from a backup. So in that call path, restoreDataLocked is called with deleteBeforeRestore = true.
Regardless of how vttablet is being restarted, whether by vtop or something else, --restore_from_backup does not mean that you throw away existing data. Any change to this behavior and expectation will break most existing Vitess deployments.
The reason this was implemented in this fashion is because it is not easy to change vttablet startup parameters such that the first time a new tablet comes up, it restores from a backup, but subsequently only restarts. So even though the flag is called --restore_from_backup it needs to be understood that the restore only happens if the data dir / database is not already present.

rvrangel · 2024-09-26T11:12:26Z

thanks for looking into @deepthi. My understanding and @frouioui can probably weigh in since it was him that originally reported this behaviour, was that when creating a new tablet in the vitess operator, it would bring up the pod with existing data, then start the vttablet with --restore_from_backup and expect a backup to be restored (even though as you mentioned above, this shouldn't happen).

In that particular case, it seems ShouldRestore() inside restoreDataLocked() returned true even though it should have returned false because deleteBeforeRestore = false and the database already existed on MySQL, but the code was still reaching all the way up to the backup engine's ExecuteRestore()?

frouioui · 2024-10-21T20:25:29Z

Moreover, because we defined a backup configuration on our K8S yaml file, we will start vttablet with the --restore_from_backup flag. Which will try to load any existing backup after the tablets are up and running.

Is what I said in the comment you linked @rvrangel, which is actually wrong and was based on bad assumptions. When the data directory of the tablet is not empty, we will restart the tablet from that directory and not from backup. I created a test cluster to illustrate it:

$ kubectl get pods
NAME                                                         READY   STATUS             RESTARTS        AGE
...
example-vttablet-zone1-0790125915-4e37d9d5                   3/3     Running            1 (7m21s ago)   9m7s
...

# I decide to restart that pod to simulate a vttablet that restarts
$ kubectl delete pod example-vttablet-zone1-0790125915-4e37d9d5
pod "example-vttablet-zone1-0790125915-4e37d9d5" deleted

# Let's look at the logs
$ kubectl logs example-vttablet-zone1-0790125915-4e37d9d5
...
I1021 20:17:03.882470       1 restore.go:261] Attempting to restore, but mysqld already contains data. Assuming vttablet was just restarted.
...

ShouldRestore returns false as there is a database and this condition is met:

// Returns (false, nil) if the check succeeds but the condition is not
// satisfied (there is a DB).

I am going to close this issue for now.

rvrangel added Type: Bug Needs Triage This issue needs to be correctly labelled and triaged labels Sep 20, 2024

rvrangel mentioned this issue Sep 20, 2024

adding new mysql shell backup engine #16295

Merged

5 tasks

deepthi added the Component: Backup and Restore label Sep 23, 2024

harshit-gangal removed the Needs Triage This issue needs to be correctly labelled and triaged label Oct 9, 2024

frouioui closed this as not planned Won't fix, can't repro, duplicate, stale Oct 21, 2024

deepthi added the Status: Working as Intended label Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Report: `DeleteBeforeRestore` is ignored in backup engines #16811

Bug Report: `DeleteBeforeRestore` is ignored in backup engines #16811

rvrangel commented Sep 20, 2024

rvrangel commented Sep 20, 2024

deepthi commented Sep 25, 2024 •

edited

Loading

rvrangel commented Sep 26, 2024

frouioui commented Oct 21, 2024

Bug Report: DeleteBeforeRestore is ignored in backup engines #16811

Bug Report: DeleteBeforeRestore is ignored in backup engines #16811

Comments

rvrangel commented Sep 20, 2024

Overview of the Issue

Reproduction Steps

Binary Version

Operating System and Environment details

Log Fragments

rvrangel commented Sep 20, 2024

deepthi commented Sep 25, 2024 • edited Loading

rvrangel commented Sep 26, 2024

frouioui commented Oct 21, 2024

Bug Report: `DeleteBeforeRestore` is ignored in backup engines #16811

Bug Report: `DeleteBeforeRestore` is ignored in backup engines #16811

deepthi commented Sep 25, 2024 •

edited

Loading