source/_posts/2013-02-20-encrypted-remote-backup-with-rsync-and-dm-crypt-part-1-slash-2.markdown
dce3d091
 ---
 layout: post
 title: "Encrypted remote backup with rsync and dm-crypt: Part 1/2"
 date: 2013-02-20 22:41
 comments: true
 categories: [server, shell]
 cover: /images/cover/avatar.png
 keywords: rsync, backup, timemachine, linux, ssh, hardlinks, encrypt
 description: Backing up data to server
 ---
 
 # Choose the right tool
 
 Rsync is an ultimate tool for backup purposes. It offers transmitting data
 remotely and securely over SSH. Also, it offers ```--link-dest``` option which
 guarantees files are not duplicated in a filesystem thanks to [hard links](http://en.wikipedia.org/wiki/Hard_link). 
 By the way, the same way it does proprietary [Apple Time Machine](http://www.apple.com/osx/apps/#time-machine).
 
 # Start writing a script
 
 Sure, we will use many more other useful options, not just ```--link-dest```. Here's a simplified version of my
 backup script:
 
 {% codeblock lang:bash %}{% raw %}
 #!/bin/env bash
 date=`date "+%Y-%m-%d_%H:%M"`
 
 rsync -av \
  --delete \
  --delete-excluded \
  --compress-level=9 \
  --numeric-ids \
  --rsync-path="sudo rsync" \
  --exclude-from=/root/.rsync/home-cinan \
  --link-dest=/mnt/current-backup/home \
  /home/cinan -e ssh
  sync-user@machine:incomplete_backup-$date/home/
 
 mv incomplete_backup-$date backup-$date && rm -rf current-backup && ln -s backup-$date current-backup
 {% endraw%}{% endcodeblock %}
 
 <!-- more -->
 
 Explanation:
 
 - ```-a``` typical option for backup, archive mode (recursive copy, copy symlinks
   as symlinks, preserve ownerships). Preserving ownerships works only if rsync
   is run on our backup machine with root rights -- see ```--rsync-path option```.
 - ```-v``` be verbose, but not too much
 - ```--delete``` delete extraneous files from destination directories
 - ```--delete-excluded``` also delete excluded files from destination directories
 - ```--compress-level=9``` highest compression level (CPU is fast, network ain't)
 - ```--numeric-ids``` don't map uid/gid to users/groups
 - ```--rsync-path=<some-path>``` to preserve ownerships we need to run rsync as
   root on our backup machine.
 - ```--exclude-from=<some-path>``` exclude some directories and files from backup
 - ```--link-dest=<some-path>``` magic. Hardlink unchanged files to files in
   \<some-path\> directory
 - ```/home/cinan``` directory to backup
 - ```-e ssh``` specify the remote shell to use
 - ```<path>``` directory where will be backup saved
 - ```mv ... & rm ... && ln``` mark the newest backup as complete. Delete the old backup
   link directory (don't worry, you won't lose your data, it's just a symlink)
   and symlink the newest backup to current-backup directory (useful for future
   backups which will use this directory in ```--link-dest```).
 
 *On the bottom of this article I'll show complete script.*
 
 Keep an eye on slashes! There's a huge difference between /home/cinan and
 /home/cinan/ (source directory). Without the final slash, rsync will copy the
 directory in its entirety. With the trailing slash, it will copy the contents of
 the directory but won't recreate the directory.
 
 # Little bit of security
 
d5068d36
 The sync-user usershould be able to run ```sudo rsync``` without asking a password. Simply
 add this line to ```/etc/sudoers```:
 
 {% codeblock %}
 sync-user ALL= NOPASSWD: /usr/bin/rsync
 {% endcodeblock %}
 
dce3d091
 Before running the script, sacrifice more of your time for sake of security. The
 ```--rsync-path``` option can be quite dangerous. On server-side setup rrsync first.
 Basically it allow sync-user to run rsync in defined something-like-chroot
 environment. Read more about rrsync [here](http://www.v13.gr/blog/?p=216).
 
 Now you can run the script. First time it can take longer time but future backups are
 incremental, so rsync will transmit only changed files.
 
 # Impact of hard links
 
 Every time the backup process create a new directory called backup-$date.
 Thanks to that it's really easy to get files from 22nd Oct 2013 or 29 Nov 2013.
 Also, it is space efficient solution because of hardlinks. If file ```dir/a```
 exists in backup from 1st Jan 2013 and also from 10th Feb 2013, data of the file
 is saved on HDD only once. However, it doesn't mean if you delete a file from January
 backup directory then the same file will be deleted in February backup directory -- 
 the February backup file preserves.
 
 Look at directory sizes. First backup is the biggest, other directory sizes are just diffs.
 {% img center /images/backup-2.png Size of my backups %}
 
 # More complete script
 
 It isn't very robust, but it works and I'm happy with it.
 
 {% codeblock lang:bash %}{% raw %}
 cmd="rsync -av \
   --delete \
   --delete-excluded \
   --compress-level=9 \
   --numeric-ids \
   --rsync-path=\"sudo rsync\" \
   --exclude-from=/root/.rsync/EXCLUDE_FROM \
   --link-dest=~/current-backup/TO \
   FROM -e ssh backup@cinan.remote:incomplete_backup-$date/TO/" 
 
 #tuples EXCLUDE_FROM, FROM, TO
 paths=( "home-cinan-exclude"    "\/home\/cinan"         "home"
         "root-exclude"          "\/root\/"              "root"
         "empty"                 "\/var\/spool"          "var"
         "empty"                 "\/var\/lib\/pacman"    "var\/lib"
         "empty"                 "\/boot\/"              "boot"
 )
 let "paths_peak=${#paths[@]} / 3 - 1"
 
 for i in `seq 0 $paths_peak`; do
   EXCLUDE_FROM=${paths[$i*3+0]}
   FROM=${paths[$i*3+1]}
   TO=${paths[$i*3+2]}
 
   ssh backup@cinan.remote "mkdir -p incomplete_backup-$date/$TO"
   eval `echo $cmd | sed -e 's/EXCLUDE_FROM/'"$EXCLUDE_FROM"'/;s/FROM/'"$FROM"'/;s/TO/'"$TO"'/g'` 2>> /tmp/system_backup_errors
 done
 
 ssh backup@cinan.remote "mv incomplete_backup-$date backup-$date && rm -rf current-backup && ln -s backup-$date current-backup"
 {% endraw %}{% endcodeblock %}
 
 # I want my data encrypted
 
e05afa7f
 Check out [2/2 part](http://blog.cinan.sk/2013/06/16/encrypted-remote-backup-with-rsync-and-dm-crypt-part-2-slash-2/).