{"id":307,"date":"2016-09-16T09:19:59","date_gmt":"2016-09-15T23:19:59","guid":{"rendered":"https:\/\/icicimov.com\/blog\/?p=307"},"modified":"2017-07-26T12:38:05","modified_gmt":"2017-07-26T02:38:05","slug":"duplicity-encrypted-backups-to-amazon-s3","status":"publish","type":"post","link":"https:\/\/icicimov.com\/blog\/?p=307","title":{"rendered":"Duplicity encrypted backups to Amazon S3"},"content":{"rendered":"<p><a href=\"http:\/\/duplicity.nongnu.org\/\">Duplicity<\/a> is a tool for creating bandwidth-efficient, incremental, encrypted backups. It backs directories by producing encrypted tar-format volumes and uploading them to a remote or local file server. And because duplicity uses <a href=\"http:\/\/sourceforge.net\/projects\/librsync\">librsync<\/a>, the incremental archives are space efficient and only record the parts of files that have changed since the last backup. It uses <a href=\"http:\/\/www.gnupg.org\/\">GnuPG<\/a> to encrypt and\/or sign these archives to provide privacy. Different backends like ftp, sftp, imap, s3 and others are supported.<\/p>\n<h2>Prepare S3 bucket and IAM user and policy in Amazon<\/h2>\n<p>First we login in our Amazon S3 console and create a bucket named <code>my-s3-bucket<\/code>. Then we create IAM user and attach the following policy to it:<\/p>\n<pre><code>{\n    \"Version\": \"2012-10-17\",\n    \"Statement\": [\n        {\n            \"Effect\": \"Allow\",\n            \"Action\": \"s3:*\",\n            \"Resource\": [\n                \"arn:aws:s3:::my-s3-bucket\",\n                \"arn:aws:s3:::my-s3-bucket\/*\"\n            ]\n        }\n    ]\n}\n<\/code><\/pre>\n<p>This will limit this user&#8217;s access to the created S3 bucket only and nothing else. Then we download the user&#8217;s AWS access key and secret access key that we are going to use in our duplicity setup. This can be done only once at the time of user creation.<\/p>\n<h2>Installation<\/h2>\n<p>Install from PPA maintained by Duplicity team:<\/p>\n<pre><code>root@server01:~# add-apt-repository ppa:duplicity-team\/ppa\nroot@server01:~# aptitude update &amp;&amp; aptitude install -y duplicity python-boto\n<\/code><\/pre>\n<p>Prepare GPG key password to use it with the gpg key later (the one given below is not the one I used for the server of course):<\/p>\n<pre><code>igorc@igor-laptop:~\/Downloads$ openssl rand -base64 20\nrwPo1U7+8xMrq6vvuTX9Rj7ILck=\n<\/code><\/pre>\n<p>Create GPG key:<\/p>\n<pre><code>root@server01:~# gpg --gen-key\ngpg (GnuPG) 1.4.16; Copyright (C) 2013 Free Software Foundation, Inc.\nThis is free software: you are free to change and redistribute it.\nThere is NO WARRANTY, to the extent permitted by law.\n\ngpg: keyring `\/root\/.gnupg\/secring.gpg' created\nPlease select what kind of key you want:\n   (1) RSA and RSA (default)\n   (2) DSA and Elgamal\n   (3) DSA (sign only)\n   (4) RSA (sign only)\nYour selection?\nRSA keys may be between 1024 and 4096 bits long.\nWhat keysize do you want? (2048)\nRequested keysize is 2048 bits\nPlease specify how long the key should be valid.\n         0 = key does not expire\n      &lt;n&gt;  = key expires in n days\n      &lt;\/n&gt;&lt;n&gt;w = key expires in n weeks\n      &lt;\/n&gt;&lt;n&gt;m = key expires in n months\n      &lt;\/n&gt;&lt;n&gt;y = key expires in n years\nKey is valid for? (0)\nKey does not expire at all\nIs this correct? (y\/N) y\n\nYou need a user ID to identify your key; the software constructs the user ID\nfrom the Real Name, Comment and Email Address in this form:\n    \"Heinrich Heine (Der Dichter) &lt;heinrichh @duesseldorf.de&gt;\"\n\nReal name: duplicity\nEmail address:\nComment: Duplicity S3 backup encryption key\nYou selected this USER-ID:\n    \"duplicity (Duplicity S3 backup encryption key)\"\n\nChange (N)ame, (C)omment, (E)mail or (O)kay\/(Q)uit? O\nYou need a Passphrase to protect your secret key.\n\ngpg: gpg-agent is not available in this session\nWe need to generate a lot of random bytes. It is a good idea to perform\nsome other action (type on the keyboard, move the mouse, utilize the\ndisks) during the prime generation; this gives the random number\ngenerator a better chance to gain enough entropy.\n\nNot enough random bytes available.  Please do some other work to give\nthe OS a chance to collect more entropy! (Need 128 more bytes)\n................+++++\ngpg: key 1XXXXXXB marked as ultimately trusted\npublic and secret key created and signed.\n\ngpg: checking the trustdb\ngpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model\ngpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u\npub   2048R\/1XXXXXXB 2016-09-15\n      Key fingerprint = 5669 C5C7 FFCC 4698 0E00  BDA2 0CAE 27AC 171E 6C5B\nuid                  duplicity (Duplicity S3 backup encryption key)\nsub   2048R\/5XXXXXX8 2016-09-15\n<\/code><\/pre>\n<p>The GnuPGP documentation <a href=\"https:\/\/www.gnupg.org\/documentation\/manuals\/gnupg\/Unattended-GPG-key-generation.html\">Unattended GPG key generation<\/a> explains how to automate the key generation by feeding an answer file via <code>--batch<\/code> option. It is also a good idea to install <code>haveged<\/code> daemon or <code>rng-tools<\/code> that supply the <code>rngd<\/code> daemon we can use to provide enough entropy on the server in case of low activity.<\/p>\n<p>List the keys:<\/p>\n<pre><code>root@server01:~# gpg --list-keys\n\/root\/.gnupg\/pubring.gpg\n------------------------\npub   2048R\/1XXXXXXB 2016-09-15\nuid                  duplicity (Duplicity S3 backup encryption key)\nsub   2048R\/5XXXXXX8 2016-09-15\n<\/code><\/pre>\n<p>Export and email the key for safe storage:<\/p>\n<pre><code>root@server01:~# gpg --armor --export duplicity | mail -s \"server01 duplicity GPG key\" igorc@encompasscorporation.com\nroot@server01:~# gpg --armor --export-private-key duplicity | mail -s \"server01 duplicity private GPG key\" igorc@encompasscorporation.com\n<\/code><\/pre>\n<p>Create backup dir structure:<\/p>\n<pre><code>root@server01:~# mkdir -p \/bkp\/{backups,duplicity_archives,restore}\nroot@server01:~# mkdir -p \/bkp\/backups\/mongo\n<\/code><\/pre>\n<h2>Backups<\/h2>\n<p>First run for the documents <code>\/data<\/code> files. Duplicity is very flexible and feature reach so we can even specify a backup strategy upon first run telling it when to take full or incremental backup and for how long to retain them:<\/p>\n<pre><code>root@server01:~# export PASSPHRASE=\"rwPo1U7+8xMrq6vvuTX9Rj7ILck=\"\nroot@server01:~# export AWS_ACCESS_KEY_ID=\"my-aws-access-key\"\nroot@server01:~# export AWS_SECRET_ACCESS_KEY=\"my-aws-secret-key\"\nroot@server01:~# cd \/bkp\/backups\nroot@server01:\/bkp\/backups# \/usr\/bin\/duplicity --s3-european-buckets \\\n  --s3-use-new-style --encrypt-key 1XXXXXXB --asynchronous-upload -v 4 \\\n  --archive-dir=\/bkp\/duplicity_archives\/data incr --full-if-older-than 14D \\\n  \/data \"s3+http:\/\/my-s3-bucket\/trtest\/${HOSTNAME}\/data\"\n\nLocal and Remote metadata are synchronized, no sync needed.\nLast full backup date: none\nLast full backup is too old, forcing full backup\n--------------[ Backup Statistics ]--------------\nStartTime 1473912738.45 (Thu Sep 15 05:12:18 2016)\nEndTime 1473912847.27 (Thu Sep 15 05:14:07 2016)\nElapsedTime 108.83 (1 minute 48.83 seconds)\nSourceFiles 9622\nSourceFileSize 1876302897 (1.75 GB)\nNewFiles 9622\nNewFileSize 1876302897 (1.75 GB)\nDeletedFiles 0\nChangedFiles 0\nChangedFileSize 0 (0 bytes)\nChangedDeltaSize 0 (0 bytes)\nDeltaEntries 9622\nRawDeltaSize 1876110545 (1.75 GB)\nTotalDestinationSizeChange 1288698999 (1.20 GB)\nErrors 0\n-------------------------------------------------\n\nroot@server01:\/bkp\/backups#\n<\/code><\/pre>\n<p>The backup is in a encrypted archive format in the target S3 bucket, see the attached screen shot below:<\/p>\n<p><a href=\"https:\/\/icicimov.com\/blog\/wp-content\/uploads\/2016\/09\/duplicity_encrypted_backup_in_s3.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/icicimov.com\/blog\/wp-content\/uploads\/2016\/09\/duplicity_encrypted_backup_in_s3.png\" alt=\"\" width=\"1729\" height=\"348\" class=\"aligncenter size-full wp-image-309\" srcset=\"https:\/\/icicimov.com\/blog\/wp-content\/uploads\/2016\/09\/duplicity_encrypted_backup_in_s3.png 1729w, https:\/\/icicimov.com\/blog\/wp-content\/uploads\/2016\/09\/duplicity_encrypted_backup_in_s3-420x85.png 420w, https:\/\/icicimov.com\/blog\/wp-content\/uploads\/2016\/09\/duplicity_encrypted_backup_in_s3-744x150.png 744w, https:\/\/icicimov.com\/blog\/wp-content\/uploads\/2016\/09\/duplicity_encrypted_backup_in_s3-768x155.png 768w, https:\/\/icicimov.com\/blog\/wp-content\/uploads\/2016\/09\/duplicity_encrypted_backup_in_s3-1200x242.png 1200w\" sizes=\"auto, (max-width: 1729px) 100vw, 1729px\" \/><\/a><\/p>\n<p>First run for Mongo backup, we want to execute only on a <code>SECONDARY<\/code> server so we include that check too:<\/p>\n<pre><code>root@server01:~# export PASSPHRASE=\"rwPo1U7+8xMrq6vvuTX9Rj7ILck=\"\nroot@server01:~# export AWS_ACCESS_KEY_ID=\"my-aws-access-key\"\nroot@server01:~# export AWS_SECRET_ACCESS_KEY=\"my-aws-secret-key\"\n\nroot@server01:~# [[ $(\/usr\/bin\/mongo --quiet --host 127.0.0.1:27017 admin --eval \\\n  'db.isMaster().ismaster') == \"false\" ]] &amp;&amp; \/usr\/bin\/mongodump --host 127.0.0.1:27017 \\\n  --authenticationDatabase=encompass --username my-user-name --password my-password \\\n  --out \/bkp\/backups\/mongo --oplog\n\nroot@server01:\/bkp\/restore# \/usr\/bin\/duplicity --s3-european-buckets --s3-use-new-style \\\n  --encrypt-key 1XXXXXXB --asynchronous-upload -v 4 --archive-dir=\/bkp\/duplicity_archives\/mongo incr \\\n  --full-if-older-than 14D \/bkp\/backups\/mongo \"s3+http:\/\/my-s3-bucket\/trtest\/${HOSTNAME}\/mongo\"\n\nLocal and Remote metadata are synchronized, no sync needed.\nLast full backup date: none\nLast full backup is too old, forcing full backup\n--------------[ Backup Statistics ]--------------\nStartTime 1473915709.33 (Thu Sep 15 06:01:49 2016)\nEndTime 1473915720.69 (Thu Sep 15 06:02:00 2016)\nElapsedTime 11.36 (11.36 seconds)\nSourceFiles 127\nSourceFileSize 472458569 (451 MB)\nNewFiles 127\nNewFileSize 472458569 (451 MB)\nDeletedFiles 0\nChangedFiles 0\nChangedFileSize 0 (0 bytes)\nChangedDeltaSize 0 (0 bytes)\nDeltaEntries 127\nRawDeltaSize 472442185 (451 MB)\nTotalDestinationSizeChange 82399352 (78.6 MB)\nErrors 0\n-------------------------------------------------\n\nroot@server01:~#\n<\/code><\/pre>\n<p>We can confirm the backup like this:<\/p>\n<pre><code>root@server01:\/bkp\/restore# PASSPHRASE=\"rwPo1U7+8xMrq6vvuTX9Rj7ILck=\" duplicity list-current-files \\\n  --s3-european-buckets --s3-use-new-style \"s3+http:\/\/my-s3-bucket\/trtest\/${HOSTNAME}\/mongo\"\n\nSynchronizing remote metadata to local cache...\nCopying duplicity-full-signatures.20160915T050149Z.sigtar.gpg to local cache.\nCopying duplicity-full.20160915T050149Z.manifest.gpg to local cache.\nLast full backup date: Thu Sep 15 06:01:49 2016\nThu Sep 15 05:57:49 2016 .\nThu Sep 15 05:57:49 2016 encompass\n[...]\nThu Sep 15 05:57:49 2016 oplog.bson\nroot@server01:\/bkp\/restore#\n<\/code><\/pre>\n<p>The next backups we run will be incremental for the next 14 days then duplicity will create a new full backup and maintain up to 4 full backups in the archive.<\/p>\n<p>We can use duplicity to backup ElasticSearch as well:<\/p>\n<pre><code>root@sl02:\/bkp# \/usr\/bin\/duplicity --s3-european-buckets --s3-use-new-style --encrypt-key AXXXXXXB \\\n  --asynchronous-upload -v 4 --archive-dir=\/bkp\/duplicity_archives\/elasticsearch incr \\\n  --full-if-older-than 14D \/var\/lib\/elasticsearch \"s3+http:\/\/my-s3-bucket\/trtest\/${HOSTNAME}\/elasticsearch\"\n\nLocal and Remote metadata are synchronized, no sync needed.\nLast full backup date: none\nLast full backup is too old, forcing full backup\n--------------[ Backup Statistics ]--------------\nStartTime 1473920881.54 (Thu Sep 15 07:28:01 2016)\nEndTime 1473920881.85 (Thu Sep 15 07:28:01 2016)\nElapsedTime 0.31 (0.31 seconds)\nSourceFiles 589\nSourceFileSize 1132601 (1.08 MB)\nNewFiles 589\nNewFileSize 1132601 (1.08 MB)\nDeletedFiles 0\nChangedFiles 0\nChangedFileSize 0 (0 bytes)\nChangedDeltaSize 0 (0 bytes)\nDeltaEntries 589\nRawDeltaSize 10297 (10.1 KB)\nTotalDestinationSizeChange 11618 (11.3 KB)\nErrors 0\n-------------------------------------------------\n\nroot@sl02:\/bkp#\n<\/code><\/pre>\n<h2>Restore a Backup<\/h2>\n<p>No point to backup if we can&#8217;t restore it. To recover the backup we don&#8217;t need to provide anything but the password phrase for the GPG encryption key. Duplicity knows via its meta data which key to use to decrypt the data.<\/p>\n<h3>Full restore<\/h3>\n<p>Lets restore everything from the mongo backup we took previously:<\/p>\n<pre><code>root@server01:~# cd \/bkp\/restore\n\nroot@server01:\/bkp\/restore# mkdir mongo\n\nroot@server01:\/bkp\/restore# PASSPHRASE=\"rwPo1U7+8xMrq6vvuTX9Rj7ILck=\" \\\n  duplicity restore --s3-european-buckets \\\n  --s3-use-new-style \"s3+http:\/\/my-s3-bucket\/trtest\/${HOSTNAME}\/mongo\" mongo\/\n\nLocal and Remote metadata are synchronized, no sync needed.\nLast full backup date: Thu Sep 15 06:01:49 2016\n\nroot@server01:\/bkp\/restore# ls -l mongo\/\ntotal 12\ndrwxr-xr-x 2 root root 4096 Sep 15 05:57 admin\ndrwxr-xr-x 2 root root 4096 Sep 15 05:57 encompass\ndrwxr-xr-x 2 root root 4096 Sep 15 05:57 encompass_admin\n-rw-r--r-- 1 root root    0 Sep 15 05:57 oplog.bson\n<\/code><\/pre>\n<p>Now we can use <code>mongorestore<\/code> to recover the db&#8217;s as per usual using <code>--oplog<\/code> option for consistent recovery and <code>oplog<\/code> replay.<\/p>\n<h3>Restore specific file(s)<\/h3>\n<p>Lets say we want to restore specific collection from the mongo backup:<\/p>\n<pre><code>root@server01:\/bkp\/restore# mkdir files\n\nroot@server01:\/bkp\/restore# PASSPHRASE=\"rwPo1U7+8xMrq6vvuTX9Rj7ILck=\" duplicity restore -v 4 --s3-european-buckets --s3-use-new-style --file-to-restore encompass\/my-collection-name.bson \"s3+http:\/\/my-s3-bucket\/trtest\/${HOSTNAME}\/mongo\" files\/my-collection-name.bson\nLocal and Remote metadata are synchronized, no sync needed.\nLast full backup date: Thu Sep 15 06:01:49 2016\nroot@server01:\/bkp\/restore#\n<\/code><\/pre>\n<p>To make duplicity really verbose we can increase the level to 9 for the next file so we can see what is duplicity doing under the hood.<\/p>\n<p>We can now see our two restored files:<\/p>\n<pre><code>root@server01:\/bkp\/restore# ls -l files\ntotal 3564\n-rw-r--r-- 1 root root 3644494 Sep 15 05:57 my-collection-name.bson\n-rw-r--r-- 1 root root     259 Sep 15 05:57 my-collection-name.metadata.json\n<\/code><\/pre>\n<p>and use <code>mongorestore<\/code> to recover the <code>my-collection-name<\/code> collection in the encompass database.<\/p>\n<h2>Automating the backups<\/h2>\n<p>The attached scripts can be used to backup the tomcat saved documents, mongo and elastic search using crontab. For example:<\/p>\n<pre><code class=\"bash\"># Duplicity backups to Amazon S3\n00 02 * * * \/usr\/local\/bin\/duplicity_es_backup.sh my-s3-bucke&gt; &gt; \/dev\/null 2&gt;&amp;1\n15 02 * * * \/usr\/local\/bin\/duplicity_data_backup.sh my-s3-bucket &gt; \/dev\/null 2&gt;&amp;1\n30 02 * * * \/usr\/local\/bin\/duplicity_mongodb_backup.sh my-s3-bucket &gt; \/dev\/null 2&gt;&amp;1\n<\/code><\/pre>\n<p>We store the pass-phrase and other sensitive data in a <code>~\/.duplicity<\/code> file that we source in runtime so we don&#8217;t have to provide them in the scripts in clear text.<\/p>\n<pre><code># GPG key passphrase\nexport PASSPHRASE=\"rwPo1U7+8xMrq6vvuTX9Rj7ILck=\"\n# the IAM user credentials\nexport AWS_ACCESS_KEY_ID=\"my-aws-access-key\"\nexport AWS_SECRET_ACCESS_KEY=\"my-aws-secret-key\"\n<\/code><\/pre>\n<p>and set proper permissions:<\/p>\n<pre><code>root@server01:\/bkp\/restore# chmod 0600 ~\/.duplicity\n<\/code><\/pre>\n<p>Example of cron run for mongo backup script:<\/p>\n<pre>\nDate: Fri, 16 Sep 2016 02:30:12 +0100 (BST)\nFrom: Cron Daemon <root @server01.mydomain.com>\nTo: root@server01.mydomain.com\nSubject: Cron &lt;root @server01> \/usr\/local\/bin\/duplicity_mongodb_backup.sh\n \n[...]\nLocal and Remote metadata are synchronized, no sync needed.\nLast full backup date: Thu Sep 15 06:01:49 2016\n--------------[ Backup Statistics ]--------------\nStartTime 1473989408.98 (Fri Sep 16 02:30:08 2016)\nEndTime 1473989411.27 (Fri Sep 16 02:30:11 2016)\nElapsedTime 2.28 (2.28 seconds)\nSourceFiles 127\nSourceFileSize 472458569 (451 MB)\nNewFiles 4\nNewFileSize 16384 (16.0 KB)\nDeletedFiles 0\nChangedFiles 123\nChangedFileSize 472442185 (451 MB)\nChangedDeltaSize 0 (0 bytes)\nDeltaEntries 127\nRawDeltaSize 7486 (7.31 KB)\nTotalDestinationSizeChange 6769 (6.61 KB)\nErrors 0\n-------------------------------------------------\n[...]\n<\/root><\/pre>\n<p>The scripts are available for download: <a href=\"https:\/\/icicimov.github.io\/blog\/download\/duplicity_mongodb_backup.sh\">duplicity_mongodb_backup.sh<\/a>, <a href=\"https:\/\/icicimov.github.io\/blog\/download\/duplicity_es_backup.sh\">duplicity_es_backup.sh<\/a>, <a href=\"https:\/\/icicimov.github.io\/blog\/download\/duplicity_data_backup.sh\">duplicity_data_backup.sh<\/a>. The backup strategy in the scripts is full backup every 2 weeks and daily incremental in between keeping last 12 full backups and remove all incremental backups older than the last 4 full backups.<\/p>\n<h2>Duply (simple duplicity)<\/h2>\n<p><a href=\"http:\/\/duply.net\/\">Duply<\/a> is kind of front-end for duplicity. According to its documentation, duply simplifies running duplicity with cron or on command line by:<\/p>\n<ul>\n<li>keeping recurring settings in profiles per backup job<\/li>\n<li>automated import\/export of keys between profile and keyring<\/li>\n<li>enabling batch operations eg. backup_verify_purge<\/li>\n<li>executing pre\/post scripts<\/li>\n<li>precondition checking for flawless duplicity operation<\/li>\n<\/ul>\n<p>Worth looking into in case we need it.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Duplicity is a tool for creating bandwidth-efficient, incremental, encrypted backups. It backs directories by producing encrypted tar-format volumes and uploading them to a remote or local file server. And because duplicity uses librsync, the incremental archives are space efficient and&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11],"tags":[32,34,33],"class_list":["post-307","post","type-post","status-publish","format-standard","hentry","category-aws","tag-aws","tag-gpg","tag-s3"],"_links":{"self":[{"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/307","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=307"}],"version-history":[{"count":4,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/307\/revisions"}],"predecessor-version":[{"id":447,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/307\/revisions\/447"}],"wp:attachment":[{"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=307"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=307"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/icicimov.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=307"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}