mail

Rsync : notes about backups

This article summarizes some obvious stuff which is easy to forget (and cause lost time in debugging). So here are some basic + checked concepts I can trust next time I won't understand what's doing on !

When using Rsync's --backup, --backup-dir and --suffix options :

backup files :

  • are on the destination end of the transfer
  • represent what was there before running rsync
  • by default have no suffix when backups are in a distinct directory

when —after rsync— the destination shows :

file1
file1suffix
file2suffix
file3
it means :
  • file1 changed either on the source or on the destination, hence the backup. You can't determine where the change occurred.
  • file2 was missing on the source, then deleted on the destination, and only the backup remains
  • as for file3 we can't know whether :
    • it was already there, unchanged (both on source and destination), then untouched by rsync
    • or it's been created on the source side and just transferred by rsync

Regarding the suffix value :

A fixed string :

This is what may happen if :
  • you hardcode --suffix=myStringThatNeverChanges
  • or if that string is the date (e.g. 2020-06-04) and you run several rsync on the same day (i.e. with the same suffix)
Round description on the source on the destination Comment
filename contents filename contents
setup myFile
Hello world
(doesn't exist yet)
rsync (no change) myFile
Hello world
edit myFile
Hello world
edit 1
(no change)
rsync --backup --suffix=_BACKUP (no change) myFile
Hello world
edit 1
myFile_BACKUP
Hello world
backup created
edit myFile
Hello world
edit 1
edit 2
(no change)
rsync --backup --suffix=_BACKUP (no change) myFile
Hello world
edit 1
edit 2
myFile_BACKUP
Hello world
edit 1
new backup has the same name as the existing backup : overwrite

Any string changing at every rsync round :

Round description on the source on the destination Comment
filename contents filename contents
setup myFile
Hello world
(doesn't exist yet)
rsync (no change) myFile
Hello world
edit myFile
Hello world
edit 1
(no change)
rsync --backup --suffix=_BACKUP1 (no change) myFile
Hello world
edit 1
myFile_BACKUP1
Hello world
backup 1 created
edit myFile
Hello world
edit 1
edit 2
(no change)
rsync --backup --suffix=_BACKUP2 (no change) myFile
Hello world
edit 1
edit 2
myFile_BACKUP2
Hello world
edit 1
backup 2 created
myFile_BACKUP1_BACKUP2
Hello world
no myFile_BACKUP1 on the destination, so file deleted. Remember ?

Summary :

Format of the suffix value Number of versions kept back
fixed string 1
any string changing at every rsync round 1 for every round
mail

Rsync can delete millions of files quicker than rm and find

  1. mkdir /tmp/empty
  2. rsync -a --delete /tmp/empty/ directory/to/delete/
    Use either 2 trailing slashes or none on directories names. Otherwise, this has a totally different meaning for rsync.
  3. rmdir /tmp/empty
mail

rsync

Usage :

rsync is a great tool to synchronize directory trees on a local machine or between a local and a remote machine. It allows :

By default, rsync transfers files having either a different size or time of last modification between the source and the destination. This is known as rsync's quick check (details : man -P 'less -p "quick check"' rsync).
rsync can be instructed to use other criteria to match / skip files :

When gathering files in a temporary directory in view of rsync'ing them to some distant place (i.e. developing a delivery script), keep in mind that their modification time is the instant they where cp'ed into that temporary directory, even though their content has not changed. Thus, by default, rsync will always consider them as different from those of the distant storage.
Solution : use the -c or -t flags or cp -p.

Exit Status :

Code Meaning
0 Success
1 Syntax or usage error
3 Errors selecting input/output files, dirs
and many more, see : man -P 'less -p "^EXIT VALUES"' rsync

Flags :

Flag Usage
-a --archive archive mode, equivalent to rlptgoD, i.e. :
recursive + preserves (symlinks + permissions + time + group + owner + devices)
--backup
  1. make a backup of the destination file if :
    • it changes (changed either on the source or destination)
    • it is deleted (deleted from source + --delete flag)
  2. apply the corresponding suffix
To be used with --backup-dir and --suffix. See also some extra notes.
--backup-dir=dir make backups in hierarchy based on dir (default : current directory)
--bwlimit=n limit I/O bandwidth to n KBytes per second
-c --checksum skip based on checksum, not modification time or size (i.e. if checksums match on both ends, skip file)
-D short for --devices --specials
--delete delete files that don't exist on sender
This conflicts with options altering the list of files to synchronize, such as --include-from, --exclude, ...
--devices transfer character and block device files to the remote system to recreate these devices. Requires root privileges
--exclude=pattern exclude files matching pattern from the list of files to be transferred. Use as many --exclude options as the number of patterns to exclude.
--exclude-from=file exclude files matching patterns listed in file
--files-from=file read list of source file names from file
  • file must contain an explicit list of all files to transfer: no wildcards allowed. For pattern matching on file names, consider include-from
  • files listed in file must have a path relative to sourceDir (details : man -P 'less -p "--files-from=FILE$"' rsync) :
    rsync [options] /path/to/sourceDir destinationDir --files-from=file
  • Using --files-from does not make /path/to/sourceDir optional !
-g --group preserve group
-H --hard-links preserve hard links
-I --ignore-times synchronize everything, skip nothing
This option name is a little puzzling since it suggests that only the file size will be used to skip files, which is not true : files having the same size will be sent anyway. If you actually want to only consider the file size to skip files, consider --size-only.
-i --itemize-changes list the changed files and the cause of each change, coded as an 11-character string
details : man -P 'less -p "-i, --itemize-changes$"' rsync
--ignore-existing ignore files that already exist on receiver
--include-from=file include files matching patterns listed in file
-L --copy-links dereference symlinks (i.e. transform symlink into referent file or directory)
-l --links copy symlinks as symlinks
-n --dry-run
  • show what would have been transferred
  • often used with --verbose and --itemize-changes
  • since no data is transferred in this mode :
    • --progress has no effect
    • statistics about sent/received bytes and speedup are wrong
-O --omit-dir-times omit directories from --times
-o --owner preserve owner (root only)
-p --perms preserve permissions
--progress show progress during transfer (for each file)
-q --quiet suppress non-error messages
-R --relative use Relative path names
-r --recursive recurse into directories
--remove-source-files sender removes synchronized files (non-dir)
--size-only transfer files that have changed in size. This is useful when starting to use rsync after using another mirroring system which may not preserve timestamps exactly.
--specials transfer special files such as named sockets and fifos
--suffix=suffix append suffix to backup files name. Defaults :
  • (empty string) when --backup-dir is specified (hence which files are backups is explicit)
  • ~ otherwise
-t --times preserve times
-u --update update only : don't overwrite newer files
-v --verbose verbose mode
-W --whole-file copy files whole : disables the incremental algorithm (source)
-z --compress compress data
--compress-level=level set the compression level to use
  • implies --compress when level > 0
  • 0 ≤ level ≤ 9
  • defaults to 6
--skip-compress=list Override the list of file suffixes that will not be compressed
  • list should be one or more file suffixes (without the dot) separated by slashes /
  • entries can contain pattern-matching characters : --skip-compress=gz/jpg/mp[34]/7z/bz2
  • default list of suffixes that will not be compressed (may evolve between Rsync versions) :
    7z ace avi bz2 deb gpg gz iso jpeg jpg lz lzma lzo mov mp3 mp4 ogg png rar rpm rzip tbz tgz tlz txz xz z zip

Example :

Basic usage, subtlety about the trailing slash / :

rsync -avz /source/directory /destination/directory
Recursively transfer all files from /source/directory into /destination/directory in archive mode + use compression to reduce the size of data portions of the transfer.
rsync -avz /source/directory/ /destination/directory
A trailing slash / on the source avoids creating an additional directory level at the destination. This means copy the contents of this directory as opposed to copy the directory itself. In both cases the attributes of the containing directory are transferred to the containing directory on the destination.
These commands are equivalent :
rsync -av /src/foo	/dest
rsync -av /src/foo/	/dest/foo

Local to local as FAT 32

rsync -rvz --size-only /source/directory /destination/directory

Remote to local over SSH

rsync -avz -e ssh [source] [destination]

To skip password input :

  1. setup SSH accounts + SSH keys on local + remote hosts
  2. build a custom SSH configuration file
  3. instruct rsync over SSH to use this configuration file (mind the quotes) :
    rsync -avz -e "ssh -F /home/bob/.ssh/config -i /home/bob/.ssh/id_rsa" bob@remoteHost:/path/to/file /destination/directory

When rsyncing over SSH in a script like :
#!/usr/bin/env bash

tmpDirectory='/path/to/directory/'

rsync -avz -e 'ssh -i /home/bob/.ssh/id_rsa' bob@serverA:/etc/lighttpd/*conf "$tmpDirectory"
rsync -avz -e 'ssh -i /home/bob/.ssh/id_rsa' "$tmpDirectory" bob@serverB:/etc/lighttpd/
ssh -i /home/bob/.ssh/id_rsa bob@serverB 'sudo /etc/init.d/lighttpd restart'
  • works if copy-pasting commands into a terminal
  • works if the final SSH command is commented
  • fails if the final SSH command doesn't explicitly specify the key with -i /path/to/privateKey : the 2nd rsync prompts for password. My explanation (to be confirmed) : SSH - which is used for all 3 commands - doesn't like "mixed" key specification (explicit / implicit)

Copy a whole file tree to an empty destination (source)

Turn off the metadata checks and the rsync block checksum algorithm entirely and transfer all of the bits at the source to the dest without any CPU penalty :
Note that -a does not preserve hardlinks, because finding multiply-linked files is expensive. You must separately specify -H.