Borg - aka "BorgBackup" : Deduplicating archiver with compression and encryption

mail

Borg usage

Installed with the Debian package

borgbackup

Usage

Permissions needed to backup (source) :

  • your own files : just run Borg as your normal user
  • files of other users or the operating system : running Borg as root will be required

initialize a repository

A repository can either be a local filesystem or a remote storage accessed via ssh (details). A different syntax is needed to refer to a local vs "ssh" repository. Examples below apply to a local repository.
borg init --encryption=mode /path/to/repo

make a backup (aka archive)

borg create --stats --progress /path/to/repo::archiveName path/to/data/to/backup
  • archiveName : for a daily backup, the file name may be something like yyyy-mm-dd, which can be achieved by :
  • --stats and --progress are there to make Borg more verbose (it's pretty quiet by default)
  • archives are compressed with default settings when --compression is omitted
  • you may also make Borg even more verbose with -v and --list
This outputs :
Creating archive at "/path/to/repo::archiveName"

(status + list of files if --list was used)

------------------------------------------------------------------------------
Repository: /path/to/repo
Archive name: archiveName
Archive fingerprint: 5fed43cf97cc903917073a436f9255879ff14b92505750c39f41db80008c3659
Time (start): Sun, 2024-04-28 11:28:27
Time (end):   Sun, 2024-04-28 11:28:27
Duration: 0.36 seconds
Number of files: 666
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              458.04 MB            433.47 MB                563 B
All archives:              916.09 MB            866.95 MB            403.56 MB

                       Unique chunks         Total chunks
Chunk index:                     717                 1538
------------------------------------------------------------------------------

about the Deduplicated size :

                       Original size      Compressed size    Deduplicated size
This archive:              458.04 MB            433.47 MB                563 B
All archives:              916.09 MB            866.95 MB            403.56 MB
Deduplicated size of :
  • This archive : amount of data stored only for this archive, i.e. chunks that exist only in this archive
  • All archives : amount of data stored in the repo, i.e. all chunks in the repository

get information about an archive

borg info /path/to/repo::archiveName

list archives

borg list /path/to/repo

list contents of an archive

borg list /path/to/repo::archiveName
use --json-lines for details

check

  • a single archive :
    borg check /path/to/repo::archiveName
  • the repository itself (i.e. all archives) :
    borg check /path/to/repo

compare archives

borg diff /path/to/repo::archive1 archive2

restore files

with extract

borg extract /path/to/repo::archiveName

with mount

This is not exactly a file restore command, but once an archive is mounted, you can browse its contents and copy files as you wish.
  1. borg mount /path/to/repo::archiveName /mount/point
  2. browse / copy files
  3. fusermount -u /mount/point

prune repository

For consistent backup of data, you don't need to keep all archives you ever made. You'll have to define a retention policy and enforce it with :
borg prune [options] [policy] /path/to/repo
How do Borg's prune work with deduplication?
  • the magic is that files and chunks don't actually belong to an archive, they are just referenced by it
  • so any file or chunk can be —is!— referenced by multiple archives, be it archive_1, archive_2 or archive_n : this is deduplication at work !
  • since an archive is just a list of chunks, prune-ing an archive just deletes a list of keys, but chunks themselves remain untouched
  • What happens to orphan chunks (i.e. chunks that are not referenced anymore) ?
    I guess they end up being garbage-collected by borg compact, but I've found no evidence of this 
    https://www.reddit.com/r/BorgBackup/comments/17zoswk/how_does_deduplication_work_if_you_delete_a_backup/
    https://github.com/systemd/casync/issues/43
Storage space is not freed until you run borg compact.

delete archive or repository

  • archive : borg delete /path/to/repo::archiveName
  • repository : borg delete /path/to/repo
mail

Borg commands and flags

Flags

Command Flag Usage
check (none)
  • verify the consistency of a repository and its archives. It consists of two major steps :
    1. check the consistency of the repository itself
    2. check the consistency and correctness of
      • the archive metadata
      • archive data, with --verify-data
  • both steps can also be run independently with --repository-only or --archives-only
diff find differences (file contents, user/group/mode) between archives
extract (none) extract the contents of an archive
  • by default, the whole archive is extracted
  • to extract only a subset :
    • provide a list of files / directories to extract
    • use the --pattern* / --exclude* options
data is written to the current directory, there is no option to specify the output directory
-n --dry-run do not actually change any file
info display detailed information about the specified archive :
init --encryption=mode with mode:
  • none : anyone can read or alter archives
  • authenticated : archives are not encrypted but modifications will be detected
  • keyfile : stores the encrypted key into ~/.config/borg/keys/
  • repokey : stores the encrypted key into repoDir/config
  • repokey-blake2 : (?)
mode can only be specified when initializing a new repository and can't be changed afterwards
list (none) list the contents of a repository or an archive
--json format output as JSON when listing the contents of a repository
--json-lines format output as JSON when listing the contents of an archive
mount mount archive as a FUSE filesystem to browse its contents or restore individual files
prune --keep-period=n keep n period archives, with period : hourly, daily, weekly, monthly, yearly
  • a daily archive to keep is the latest archive made on a given day (and so on with other values of period)
  • n must be understood as the number of archives to keep, whenever archives are made :
    with --keep-daily=7, the oldest kept archive will be
    • 1 week old if we backup every day
    • 70 days old if we backup every 10 days
    n counts archives, not periods
-n --dry-run do not actually change any file
-p --progress show progress
-v --verbose --info verbose mode : display INFO level log entries
mail

Borg concepts and glossary

archive
  • what you get after a single backup operation (run with borg create )
  • contains a backup copy —i.e. snapshot of data— and aka backup
  • one can later extract or mount an archive to restore data
  • Borg archives take advantage of (source) :
    • deduplication : any file chunk is stored only once, allowing daily backups since only changes are stored
    • compression : save space (at the cost of speed / CPU usage)
    • authenticated encryption : you may store archives on not fully trusted targets
caches
  • files cache : stored in cache/files and is used at backup time to quickly determine whether a given file is unchanged and we have all its chunks
  • chunks cache : stored in cache/chunks and is used to determine whether we already have a specific chunk
chunks
  • The object graph
  • Files are sliced in chunks (which is at the root of deduplication) : when creating a new archive, if one of these slices of data is already stored in an existing chunk, that chunk is linked/referenced by the new archive, hence using no storage space. (inspired by, details)
repository
segment
  • Transactionality is achieved by using a log (aka journal) to record changes. The log is a series of numbered files called segments. Each segment is a series of log entries : the segment number together with the offset of each entry relative to its segment start establishes an ordering of the log entries. (source)
  • Borg stores its data in a repository : a filesystem-based transactional key-value store. Thus the repository does not know about the concept of archives or items. Objects referenced by a key are stored inline in files (segments) of approx. 500 MB size in numbered subdirectories of /path/to/repo/data.
  • segment are strictly append-only.
  • Each log entry is like :
    CRC32 of log entry|entry size|tag|object key|data
    with :
    • tag : one of
      • PUT : the log entry adds data
      • DELETE : the log entry marks data as DELETED but doesn't actually delete data
      • COMMIT : ends a transaction
    • data : for PUT entries only