Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make backrest suitable for laptop #372

Open
pmozzati opened this issue Jul 4, 2024 · 11 comments
Open

Make backrest suitable for laptop #372

pmozzati opened this issue Jul 4, 2024 · 11 comments
Assignees
Labels
enhancement New feature or request

Comments

@pmozzati
Copy link

pmozzati commented Jul 4, 2024

Usage description
I'd like to use backrest on my laptop. As many home laptops, I guess, the typical usage is to turn it on when necessary. So the machine is subject to frequent reboot.
Since backrest always initializes its backup schedule relative to the startup time, It could possilby never start a backup.

Solution proposed
I'd like an option that allow backrest to check its last run and take a new snapshot if none has been taken within the last set time span (a day, a week or so...)
When I used rclone to backup my data, I wrote a systemd timer unit that executes a script at every boot. If the backup was completed successfully it wrote a timestamp in a file. On the next reboot, the script checked the timestamp and, if more than a day have passed, it took a new snapshot.

@pmozzati pmozzati added the enhancement New feature or request label Jul 4, 2024
@garethgeorge
Copy link
Owner

Posting here to ack the feature request and the interest I'm seeing on the bug -- this is definitely something that makes sense to support for laptop users.

My initial thoughts are to support this with a simple enum on the schedule to specify whether it's relative to the last run of the schedule OR whether it's relative to the time the task is scheduled (e.g. backrest startup)

message Schedule {
oneof schedule {
bool disabled = 1 [json_name="disabled"]; // disable the schedule.
string cron = 2 [json_name="cron"]; // cron expression describing the schedule.
int32 maxFrequencyDays = 3 [json_name="maxFrequencyDays"]; // max frequency of runs in days.
int32 maxFrequencyHours = 4 [json_name="maxFrequencyHours"]; // max frequency of runs in hours.
}
}
.

I'm wondering if it may still make sense to ensure that tasks that are scheduled relative to the last run perhaps have a randomized start delay of 0-5 minutes after the process starts to avoid the thundering herd problem on boot.

@jburnham
Copy link

I think Anacron works in a similar manner as the systemd timers by storing the time of the last run to use as a basis for when it next needs to run. The Anacrontab also allows you to specify a "delay in minutes" before execution to prevent that thundering herd problem you mentioned, and likely should be supported here.

Is there a particular reason for a need to potentially support both "relative to the last run" and "relative to the time the task is scheduled?" I'm not currently able to understand the difference in how tasks would run under each setting.

As long as the next scheduled job is run at least after the next scheduled runtime (to satisfy the max frequency setting), the next time a laptop powers up from sleep or from shutdown, it should run the late scheduled job and schedule another one at that time. That or the orchestrator loop just is constantly looking for plans that have last run times older than what is expected to satisfy the max frequency setting. I think you lose some of the UI niceness of showing specifically when the next job will run. I don't personally need that, or it can reflect an "expected" next run time.

I have pre and post hooks that submit to healthchecks.io. As long as it runs once an hour (my schedule), it shouldn't complain. I do have a grace period of like 3 days as I expect backups to stop working during the weekend. This allows a backup to run on Monday which refreshes the check. Any CONDITION_ANY_ERROR sends a /fail immediately to my check, so I wouldn't normally have to wait 3 days before finding out I'm not getting any backups. I pause the check if I expect the laptop to not be running for any extended period of time.

I admit I have not yet encountered this issue as I only today switched my laptop to using Backrest to run the scheduled jobs and only saw this issue after. I previously was using a Launchd agent with a 3600 StartInterval to run the restic backup manually, and using Backrest as just a UI to browse my snapshots.

@pmozzati
Copy link
Author

I think Anacron works in a similar manner as the systemd timers by storing the time of the last run to use as a basis for when it next needs to run. The Anacrontab also allows you to specify a "delay in minutes" before execution to prevent that thundering herd problem you mentioned, and likely should be supported here.

AFAIK on _ systemd_ based Linux distro, cron is an interface to systemd timer unit. So it is expected that systemd can perform any of the cron or anacron operations.

I also agree with adding a delay before the backup starts.

I'm not an IT expert and I also am interested in understanding the differences between the two different approaches to backup scheduling (by startup or last run)
I think the first one is more useful to define a specific execution time (let's think of a production server that take a snapshot when no one is using its services - eg. at a certain hour during the night -, while the second one is better when it comes to take at least one snapshot (let's say once a day for example) as soon as possible, if none has been taken within a specified time span.
But, as said, I may be wrong since I don't actually know the internal mechanism of backrest.

@VFansss
Copy link

VFansss commented Jul 18, 2024

This is basically the only things that make me raise an eyebrow: could be usable on a NAS that never shut down ideally but is quite dangerous if backrest is used on a Desktop pc.

From the date that I've installed Backrest (14th of July) it never started a daily backup!

Off course I'm shutting down my PC and at the next startup is simply not considering that's late for a previous scheduling, and create a new one for tomorrow (that never come, because when I turn on PC tomorrow it will schedule one at the day after tomorrow:

immagine

I don't know what could be a good approach to solve this, but I guess there would be a way to check if from the last backup day a "scheduling tick" should have been happened. If so, start the backup.

I'm wondering if it may still make sense to ensure that tasks that are scheduled relative to the last run perhaps have a randomized start delay of 0-5 minutes after the process starts to avoid the thundering herd problem on boot.

I don't think that, on a PC/laptop use case, 5 minute delay is a big issue. Certainly not like a daily backup that never start!

@garethgeorge
Copy link
Owner

Implemented some initial support for "relative scheduling" (e.g. in relation to the last run of a task) in #439 .

I'll probably refine this a bit in a followup as I'm noticing that the number of modes I've now added clutters the scheduling UI / makes it hard to pick a sane default. Look like this at the moment:

image

which is just too many options :)

I'll followup with work to filter down that set to a reasonable spread that covers most use cases. Also interested in input re: which of these options are useful. I may simply deduplicate the "every N hours" and "every N days" options, only one of those are necessary.

Also planning a bit of followup work to implement a startup delay for tasks to address the "thundering herd" issue discussed earlier, and with that done I expect this support to ship in 0.15.0.

@garethgeorge garethgeorge self-assigned this Aug 27, 2024
@pmozzati
Copy link
Author

pmozzati commented Sep 3, 2024

Really thanks for your effort in development. In my opinion, in the scheduling UI, you can safely eliminate the "every N days" option ("every N hours" could be fine) , while it should be useful when working with prune and forget
Alternatively, have you thought to a drop down lists? I mean, it could be something like:

Backup schedule: every [Input value; 0 = disabled] (hours; days) relative to (startup time, last run)

Where rounded brackets contain the drop down list's options. Square brackets is an input field, 0 means disabled.
Hope you understood despite my bad english.

@colans
Copy link

colans commented Sep 20, 2024

@garethgeorge wrote:

Also planning a bit of followup work to implement a startup delay for tasks to address the "thundering herd" issue discussed earlier, and with that done I expect this support to ship in 0.15.0.

This works really well for me in my /etc/anacrontab, running Restic via Backupninja:

1 5 backupninja if [ -x /usr/sbin/backupninja ]; then /usr/sbin/backupninja --now; fi

So that's:

  • run every day
  • wait 5 minutes

For BackUpScale, I'm really looking forward to replacing the current client front end (Backupninja, which does it partially with curses) with Backrest, but it needs to support Anacron (or something like it). I'm really happy that it already supports Windows, as we need that too.

@garethgeorge
Copy link
Owner

garethgeorge commented Sep 20, 2024

Updating to say this is shipped in https://github.com/garethgeorge/backrest/releases/tag/v1.5.0 :)

Re: thundering herd, I actually ended up releasing for the time being without a startup delay. I think it can be worked around using the new support for retry policies on hooks e.g. poll until the network is online or similar.

I use this hook on my repos:

image

Re: polling invocation, Backrest doesn't support a mode for short run polling -- it always launches as a daemon and runs in the background. For what it's worth, multi-tenancy and remote hosting for backrest are on my longer term plans (probably 2025).

@colans
Copy link

colans commented Sep 26, 2024

If exponential backoff works here, then it should be sufficient. (Anacron simply fails hard.)

When you say "remote hosting", do you mean being able to run this? I just assumed it was already possible:

  • restic --option rclone.program="ssh [email protected] rclone" --option rclone.args="serve restic --stdio --b2-hard-delete --drive-use-trash=false --verbose" --repo rclone:remote-storage:username-bucketname --password-file /etc/restic/password --verbose subcommand

@colans
Copy link

colans commented Sep 26, 2024

On the other hand, @garethgeorge , if you mean you're interested in providing a remote hosting service, then you and I should talk.

@garethgeorge
Copy link
Owner

garethgeorge commented Sep 26, 2024

Hey, sorry for the late reply -- happy to discuss more on this if you're looking at building a service around backrest.

I expect to provide a remote management solution for backrest e.g. where the operation log and encrypted configuration can be synchronized with a remote self-hosted instance of backrest (or optionally hosted in the cloud -- tbd what that aspect will look like). A future possibility is also remote execution of health operations (e.g. check / prune) but significant thought to the security model is needed if going in that direction!

I would add on that I'm largely motivated by a belief that the market needs an open backup solution (in the sense of free forever & also in the sense of understandability -- you can read the code and know what's being done with your data). Part of knowing your data is secure is knowing that you can fully copy the tools used to archive it and ensure you'll have access to (either working or debuggable) versions of those in the future as well! So I'm hopeful that anything building on backrest should extend that philosophy and be self-hostable and OSS in its own right :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants