Skip to content

jedie/PyHardLinkBackup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pyhardlinkbackup

Hardlink/Deduplication Backups with Python.

  • Backups should be saved as normal files in the filesystem:
    • accessible without any extra software or extra meta files
    • non-proprietary format
  • Create backups with versioning
    • every backup run creates a complete filesystem snapshot tree
    • every snapshot tree can be deleted, without affecting the other snapshots
  • Deduplication with hardlinks:
    • Store only changed files, all other via hardlinks
    • find duplicate files everywhere (even if renamed or moved files)
  • useable under Windows and Linux

Requirement: Python 3.6 or newer.

Please: try, fork and contribute! ;)

Build Status on github github.com/jedie/pyhardlinkbackup/actions
Build Status on travis-ci.org travis-ci.org/jedie/pyhardlinkbackup
Build Status on appveyor.com ci.appveyor.com/project/jedie/pyhardlinkbackup
Coverage Status on coveralls.io coveralls.io/r/jedie/pyhardlinkbackup
Requirements Status on requires.io requires.io/github/jedie/pyhardlinkbackup/requirements/

Example

$ phlb backup ~/my/important/documents
...start backup, some time later...
$ phlb backup ~/my/important/documents
...

This will create deduplication backups like this:

~/pyhardlinkbackups
  └── documents
      ├── 2016-01-07-085247
      │   ├── phlb_config.ini
      │   ├── spreadsheet.ods
      │   ├── brief.odt
      │   └── important_files.ext
      └── 2016-01-07-102310
          ├── phlb_config.ini
          ├── spreadsheet.ods
          ├── brief.odt
          └── important_files.ext

Installation

Windows

  1. install Python 3: https://www.python.org/downloads/
  2. Download the file boot_pyhardlinkbackup.cmd
  3. call boot_pyhardlinkbackup.cmd as admin (Right-click and use Run as administrator)

If everything works fine, you will get a venv here: %ProgramFiles%\PyHardLinkBackup

After the venv is created, call these scripts to finalize the setup:

  1. %ProgramFiles%\PyHardLinkBackup\phlb_edit_config.cmd - create a config .ini file
  2. %ProgramFiles%\PyHardLinkBackup\phlb_migrate_database.cmd - create database tables

To upgrade pyhardlinkbackup, call:

  1. %ProgramFiles%\PyHardLinkBackup\phlb_upgrade_pyhardlinkbackup.cmd

To start the Django webserver, call:

  1. %ProgramFiles%\PyHardLinkBackup\phlb_run_django_webserver.cmd

Linux

  1. Download the file boot_pyhardlinkbackup.sh
  2. call boot_pyhardlinkbackup.sh

If everything works fine, you will get a venv here: ~\pyhardlinkbackup

After the venv is created, call these scripts to finalize the setup:

  • ~/PyHardLinkBackup/phlb_edit_config.sh - create a config .ini file
  • ~/PyHardLinkBackup/phlb_migrate_database.sh - create database tables

To upgrade pyhardlinkbackup, call:

  • ~/PyHardLinkBackup/phlb_upgrade_pyhardlinkbackup.sh

To start the Django webserver, call:

  • ~/PyHardLinkBackup/phlb_run_django_webserver.sh

Starting a backup run

To start a backup run, use this helper script:

  • Windows batch: %ProgramFiles%\PyHardLinkBackup\pyhardlinkbackup_this_directory.cmd
  • Linux shell script: ~/PyHardLinkBackup/pyhardlinkbackup_this_directory.sh

Copy this file to a location that should be backed up and just call it to run a backup.

Verifying an existing backup

$ cd pyhardlinkbackup/
~/PyHardLinkBackup $ source bin/activate

(PyHardLinkBackup) ~/PyHardLinkBackup $ phlb verify --fast ~/PyHardLinkBackups/documents/2016-01-07-102310

With --fast the files' contents will not be checked. If not given: The hashes from the files' contents will be calculated and compared. Thus, every file must be completely read from filesystem, so it will take some time.

A verify run does:

  • Do all files in the backup exist?
  • Compare file sizes
  • Compare hashes from hash-file
  • Compare files' modification timestamps
  • Calculate hashes from files' contents and compare them (will be skipped if --fast used)

Configuration

phlb will use a configuration file named: PyHardLinkBackup.ini

Search order is:

  1. current directory down to root
  2. user directory

E.g. if the current working directoy is /foo/bar/my_files/ then the search path will be:

  • /foo/bar/my_files/PyHardLinkBackup.ini
  • /foo/bar/PyHardLinkBackup.ini
  • /foo/PyHardLinkBackup.ini
  • /PyHardLinkBackup.ini
  • ~/PyHardLinkBackup.ini The user home directory under Windows/Linux

Create / edit default .ini

You can just open the editor with the user directory .ini file with:

(PyHardLinkBackup) ~/PyHardLinkBackup $ phlb config

The defaults are stored here: /phlb/config_defaults.ini

Excluding files/folders from backup:

There are two ways to exclude files/folders from your backup. Use the follow settings in your PyHardLinkBackup.ini

# Directory names that will be recursively excluded from backups (comma separated list!)
SKIP_DIRS= __pycache__, temp

# glob-style patterns to exclude files/folders from backups (used with Path.match(), Comma separated list!)
SKIP_PATTERNS= *.pyc, *.tmp, *.cache

The filesystem scan is divided into two steps: 1. Just scan the filesystem tree 2. Filter and load meta data for every directory item

The SKIP_DIRS is used in the first step. The SKIP_PATTERNS is used the the second step.

Upgrading pyhardlinkbackup

To upgrade to a new version just start this helper script:

Some notes

What is 'phlb' and 'manage' ?!?

phlb is a CLI.

manage is similar to a normal Django manage.py, but it always uses the pyhardlinkbackup settings.

Why in hell do you use Django?!?

  • Well, just because of the great database ORM and the Admin Site. ;)

How to go into the Django admin?

Just start:

  • Windows: phlb_run_django_webserver.cmd
  • Linux: phlb_run_django_webserver.sh

And then request 'localhost' (Note: --noreload is needed for Windows with venv!)

Running the unit tests

Just start: phlb_run_tests.cmd / phlb_run_tests.sh or do this:

$ cd pyhardlinkbackup/
~/PyHardLinkBackup $ source bin/activate
(PyHardLinkBackup) ~/PyHardLinkBackup $ manage test

Using the CLI

$ cd pyhardlinkbackup/
~/PyHardLinkBackup $ source bin/activate
(PyHardLinkBackup) ~/PyHardLinkBackup $ phlb --help
Usage: phlb [OPTIONS] COMMAND [ARGS]...

  pyhardlinkbackup

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  add     Scan all existing backup and add missing ones...
  backup  Start a Backup run
  config  Create/edit .ini config file
  helper  link helper files to given path
  verify  Verify a existing backup

Add missing backups to the database

phlb add can be used in different scenarios:

  • recreate the database
  • add a backup manually

phlb add does this:

  • scan the complete file tree under BACKUP_PATH (default: ~/PyHardLinkBackups)
  • recreate all hash files
  • add all files to database
  • deduplicate with hardlinks, if possible

So it's possible to recreate the complete database:

  • delete the current .sqlite file
  • run phlb add to recreate

Another scenario is e.g.:

  • DSLR images are stored on a network drive.
  • You have already a copy of all files locally.
  • You would like to add the local copy to pyhardlinkbackup.

Do the following steps:

  • move the local files to a subdirectory below BACKUP_PATH
  • e.g.: ~/PyHardLinkBackups/pictures/2015-12-29-000015/
  • Note: the date format in the subdirectory name must match the SUB_DIR_FORMATTER in your config
  • run: phlb add

Now you can run phlb backup from your network drive to make a new, up-to-date backup.

Windows Development

Some notes about setting up a development environment on Windows: /dev/WindowsDevelopment.creole

Alternative solutions

See also: https://github.com/restic/others#list-of-backup-software

History

Links

Donating