Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Killing rsync not handled gracefully / rsync exit code not evaluated #489

Closed
kerner1000 opened this issue Nov 8, 2015 · 3 comments
Closed
Labels

Comments

@kerner1000
Copy link

When I abort a running backup by
sudo killall rsync BIT does not flag this snapshot with WITH ERRORS!

@Germar
Copy link
Member

Germar commented Nov 11, 2015

BIT doesn't handle return codes from rsync yet. It's on my Todo list. But before I need to switch all processes to be called by subprocess.Popen

@Germar Germar added this to the 1.2.0 milestone Aug 2, 2016
@Germar Germar modified the milestones: 1.2.0, 1.3.0 Nov 10, 2018
@protist
Copy link

protist commented Sep 23, 2020

For the record, I ran into this issue too. I had a network connection go down, and my backup was aborted. Despite setting backintime to delete the backup on errors, this meant I had an empty backup directory for that particular timepoint. The next backup was thus not incremental, and took a massive amount of space. Unfortunately this happened several times several months ago, and I only noticed it now.

To fix it, I moved the bad backup(s) to a temporary directory, created a hard-link copy of the last good backup, replaced the changed files, then deleted the old directory.

For example, where 20200719-160001-721 is a bad backup and 20200430-061501-472 is the last good backup:

$ mv 20200719-160001-721 tmp/
$ sudo rsync -aPHAX --link-dest=/path/to/backintime/hostname/root/1/20200430-061501-472 20200430-061501-472/ 20200719-160001-721
$ sudo rsync -aPHAX tmp/20200719-160001-721/ 20200719-160001-721 --delete
$ sudo rm -r tmp/20200719-160001-721

I imagine there are faster ways to do this. For my ~430G backup, the hard-link copy took 15 minutes, and the replacement rsync took 1h 36m (!).

Moving forward, I have a cron job on my server to check the log file for errors (then delete the last backup if necessary). Something like:

#!/bin/sh

set -e

cd /path/to/backintime/hostname/root/1

errors=$(bzcat last_snapshot/takesnapshot.log.bz2 | grep '^\[I\] Take snapshot (rsync: rsync error' -B3)

if [ -n "$errors" ]; then
  echo 'Errors found! Log shows:'
  printf '%s\n\n' "$errors"

  printf '%s' "Removing $(realpath last_snapshot) . . . "
  rm -r "$(realpath last_snapshot)"
  echo removed!
  rm last_snapshot
  ln -s "$(ls 20* -d | tail -1)" last_snapshot
fi

@aryoda
Copy link
Contributor

aryoda commented Aug 16, 2023

For the record, I ran into this issue too. I had a network connection go down, and my backup was aborted. Despite setting backintime to delete the backup on errors, this meant I had an empty backup directory for that particular timepoint. The next backup was thus not incremental, and took a massive amount of space.

@protist This annoying side effect may also happen (not in your case here) when the option Take a new snapshot whether there were changes or not was enabled and there were errors preventing to backup all files (the more are missing the more disk space is wasted in the next snapshot because the missing files are copied instead of linked I think).

Anyhow I have now fixed this issue by also catching rsync return codes and process signals like SIGKILL and sending an error message + writing the error into the snapshot log:

See my PR #1502

Edit: With my PR you can now also use the error user callback to e.g. send an email, the error user callback is now called for all error case when taking a snapshot (except external events like a computer crash or power supply failure).

PS: killall sends a SIGTERM (15 on Ubuntu) by default (see man killall) and this signal is now also handled gracefully in BiT (shown as "error" in the [last] snapshot log).

aryoda added a commit that referenced this issue Aug 17, 2023
…lling rsync not handled gracefully / rsync exit code not evaluated) (#1502)

Fix bug: Taking a snapshot reports `rsync` errors now even if no snapshot was taken (#1491)
Fix bug: takeSnapshot() recognizes errors now by also evaluating the rsync exit code (#489)
         Fixes related problem: Killing `rsync` was not handled gracefully (by ignoring the rsync exit code)
Fix bug: The error user-callback is now always called if an error happened during taking a snapshot (#1491)
Feature: Introduce new error codes for the "error" user callback (as part of #1491):
             5: Error while taking a snapshot.
             6: New snapshot taken but with errors.
Improvement: The `rsync` exit code is now contained in the snapshot log. Example:
             [E] Error: 'rsync' ended with exit code -9 (negative values are signal numbers, see 'kill -l')
Fix CHANGES entries (stick to our standards)

---------

Co-authored-by: aryoda <[email protected]>
@aryoda aryoda closed this as completed Aug 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants