rsync - Description, flags and examples

Rsync : notes about backups

This article summarizes some obvious stuff which is easy to forget (and cause lost time in debugging). So here are some basic + checked concepts I can trust next time I won't understand what's doing on !

When using Rsync's --backup, --backup-dir and --suffix options :

Rsync can delete millions of files quicker than rm and find

  1. mkdir /tmp/empty
  2. rsync -a --delete /tmp/empty/ directory/to/delete/
    Use either 2 trailing slashes or none on directories names. Otherwise, this has a totally different meaning for rsync.
  3. rmdir /tmp/empty

rsync

Usage :

rsync is a great tool to synchronize directory trees on a local machine or between a local and a remote machine. It allows :

By default, rsync transfers files having either a different size or time of last modification between the source and the destination. This is known as rsync's quick check (details : man -P 'less -p "quick check"' rsync).
rsync can be instructed to use other criteria to match / skip files :

When gathering files in a temporary directory in view of rsync'ing them to some distant place (i.e. developing a delivery script), keep in mind that their modification time is the instant they where cp'ed into that temporary directory, even though their content has not changed. Thus, by default, rsync will always consider them as different from those of the distant storage.
Solution : use the -c or -t flags or cp -p.

Flags :

Flag Usage
-a --archive archive mode, equivalent to rlptgoD, i.e. :
recursive + preserves (symlinks + permissions + time + group + owner + devices)
--backup
  1. make a backup of the destination file if :
    • it changes (changed either on the source or destination)
    • it is deleted (deleted from source + --delete flag)
  2. apply the corresponding suffix
To be used with --backup-dir and --suffix. See also some extra notes.
--backup-dir=dir make backups in hierarchy based on dir (default : current directory)
--bwlimit=n limit I/O bandwidth to n KBytes per second
-c --checksum skip based on checksum, not modification time or size (i.e. if checksums match on both ends, skip file)
-D short for --devices --specials
--delete delete files that don't exist on sender
This conflicts with options altering the list of files to synchronize, such as --include-from, --exclude, ...
--devices transfer character and block device files to the remote system to recreate these devices. Requires root privileges
--exclude=pattern exclude files matching pattern from the list of files to be transferred. Use as many --exclude options as the number of patterns to exclude.
--exclude-from=file exclude files matching patterns listed in file
--files-from=file read list of source file names from file
file must contain an explicit list of all files to handle. For pattern-matching on file names, consider include-from
-g --group preserve group
-H --hard-links preserve hard links
-I --ignore-times synchronize everything, skip nothing
This option name is a little puzzling since it suggests that only the file size will be used to skip files, which is not true : files having the same size will be sent anyway. If you actually want to only consider the file size to skip files, consider --size-only.
-i --itemize-changes list the changed files and the cause of each change, coded as an 11-character string
details : man -P 'less -p "-i, --itemize-changes$"' rsync
--ignore-existing ignore files that already exist on receiver
--include-from=file include files matching patterns listed in file
-L --copy-links dereference symlinks (i.e. transform symlink into referent file or directory)
-l --links copy symlinks as symlinks
-n --dry-run
  • show what would have been transferred
  • often used with --verbose and --itemize-changes
  • since no data is transferred in this mode :
    • --progress has no effect
    • statistics about sent/received bytes and speedup are wrong
-O --omit-dir-times omit directories from --times
-o --owner preserve owner (root only)
-p --perms preserve permissions
--progress show progress during transfer (for each file)
-q --quiet suppress non-error messages
-r --recursive recurse into directories
--remove-source-files sender removes synchronized files (non-dir)
--size-only transfer files that have changed in size. This is useful when starting to use rsync after using another mirroring system which may not preserve timestamps exactly.
--specials transfer special files such as named sockets and fifos
--suffix=suffix append suffix to backup files name. Defaults :
  • (empty string) when --backup-dir is specified (hence which files are backups is explicit)
  • ~ otherwise
-t --times preserve times
-u --update update only : don't overwrite newer files
-v --verbose verbose mode
-W --whole-file copy files whole : disables the incremental algorithm (source)
-z --compress compress data

Example :

Basic usage, subtlety about the trailing slash / :

rsync -avz /source/directory /destination/directory
Recursively transfer all files from /source/directory into /destination/directory in archive mode + use compression to reduce the size of data portions of the transfer.
rsync -avz /source/directory/ /destination/directory
A trailing slash / on the source avoids creating an additional directory level at the destination. This means copy the contents of this directory as opposed to copy the directory itself. In both cases the attributes of the containing directory are transferred to the containing directory on the destination.
These commands are equivalent :
rsync -av /src/foo	/dest
rsync -av /src/foo/	/dest/foo

Local to local as FAT 32

rsync -rvz --size-only /source/directory /destination/directory

Remote to local over SSH

rsync -avz -e ssh [source] [destination]

To skip password input :

  1. setup SSH accounts + SSH keys on local + remote hosts
  2. build a custom SSH configuration file
  3. instruct rsync over SSH to use this configuration file (mind the quotes) :
    rsync -avz -e "ssh -F /home/bob/.ssh/config -i /home/bob/.ssh/id_rsa" bob@remoteHost:/path/to/file /destination/directory

When rsyncing over SSH in a script like :
#!/usr/bin/env bash

tmpDirectory='/path/to/directory/'

rsync -avz -e 'ssh -i /home/bob/.ssh/id_rsa' bob@serverA:/etc/lighttpd/*conf "$tmpDirectory"
rsync -avz -e 'ssh -i /home/bob/.ssh/id_rsa' "$tmpDirectory" bob@serverB:/etc/lighttpd/
ssh -i /home/bob/.ssh/id_rsa bob@serverB 'sudo /etc/init.d/lighttpd restart'
  • works if copy-pasting commands into a terminal
  • works if the final SSH command is commented
  • fails if the final SSH command doesn't explicitly specify the key with -i /path/to/privateKey : the 2nd rsync prompts for password. My explanation (to be confirmed) : SSH - which is used for all 3 commands - doesn't like "mixed" key specification (explicit / implicit)

Copy a whole file tree to an empty destination (source)

Turn off the metadata checks and the rsync block checksum algorithm entirely and transfer all of the bits at the source to the dest without any CPU penalty :
Note that -a does not preserve hardlinks, because finding multiply-linked files is expensive. You must separately specify -H.