Streaming Compression and Encryption with Tar and OpenSSL
Post by
on May 20, 2014First of all, you want to compress before you encrypt (StackOverflow).
Second, you want to use a compression algorithm that you like. I'm choosing lzma because that's where Martin Scharm landed in his research as the best algorithm. He doesn't mention 7zip, which is also awesome.
Third, you want to choose your encryption algorithm. I landed on async encryption Denis Altudov gives very good reasons why you should, and openssl seemed like a natural choice.
Setup
To do this you just need to create an SSL certificate (i.e. your keys), replacing "hostname" with your server's host name:
$ openssl req -x509 -nodes -newkey rsa:4096 -keyout hostname.key -out hostname.crt
For more information on what to type in the screens, see SSL.
Streaming Compression and Encryption
Assuming "encrypt.txt" is what you want to compress and decrypt (this could also be a directory), run:
$ tar --lzma -cvP encrypt.txt | openssl smime -encrypt -aes-256-cbc -binary -outform DEM -out encrypt.txt.lzma.dat.openssl_smime_aes_cbc hostname.crt
I haven't landed on what to best call the output, but I figure this makes sense so you remember what you encrypted it with.
Streaming Decryption and Decompression
And, assuming "encrypt.txt.lzma.dat.openssl_smime_aes_cbc" is the file you just created, you can run the following to decrypt and decompress on the fly:
$ openssl smime -decrypt -in encrypt.txt.lzma.dat.openssl_smime_aes_cbc -binary -inform DEM -inkey hostname.key | tar --lzma -xv
Now, keep in mind that tar will extract the files to whatever your current directory is.
Update 5/28/14
I am realizing just how slow lzma encryption is. On average, gzip will be 92% faster than lzma and the resulting file will only be 3.8% bigger. So, say you have 100 GB and lzma compresses at 1.38 MB/s it will take about 20 hours and the compressed file will be 79.63 GB. Compare that with gzip on a 100 GB file. At a compression rate of 17.42 MB/s it will take only 1.6 hours and the compressed file will be 82.81 GB, which is only 3.18 GB bigger. Mike Terzza has a nice graph that shows the comparison of compression rates.
Update 6/1/14
It turns out that this command is crashing pretty hard with large data sets. I am trying to backup about 80 GB using the above command and piping it over SSH to a remote server. However, every time it cuts out at about 615 MB of compressed and encrypted data. I don't know the exact cause, but I think it's obvious I'm going about this the wrong way. That said, this command will work very well with small data sets.