How can I encrypt large files in chunks?

109 views Asked by At

I tried to use aes-gcm and sodiumoxide Rust crates and in both cases I was having an issue where I tried to encrypt a file larger than the size of a chunk buffer I will get an invalid ciphertext which is not decryptable.

Here's the example conditions I want this program to run at:
I need to encrypt 50 gigabyte file but have only 2 gigabytes of ram and 4 gigabytes of storage, so I can't take the entire file into memory and I can't write it to a new file in chunks. I need to somehow replace chunks of the file with it's encrypted ones.

Here's my code:

use clap::Parser;
use sodiumoxide::crypto::secretbox;
use sodiumoxide::crypto::secretbox::{Key, Nonce};
use sodiumoxide::hex;
use std::fs::OpenOptions;
use std::io::{self, Read, Seek, SeekFrom, Write};

const CHUNK_SIZE: usize = 1024; // Adjust chunk size as needed

#[derive(Parser)]
struct Options {
    #[clap(subcommand)]
    command: Command,
}

#[derive(Parser)]
enum Command {
    Encrypt,
    Decrypt,
}

fn main() -> io::Result<()> {
    let args = Options::parse();
    match args.command {
        Command::Encrypt => encrypt_file("poemfortest.txt"),
        Command::Decrypt => decrypt_file("poemfortest.txt"),
    }
}

fn encrypt_file(input_file_path: &str) -> io::Result<()> {
    let input_file = OpenOptions::new()
        .read(true)
        .write(true)
        .open(input_file_path)?;

    // Initialize sodiumoxide library
    sodiumoxide::init().expect("Failed to initialize sodiumoxide");

    let rawkey =
        hex::decode("ba72744932db553b55cb944aa8e5739cf23a7f668c32d164c51ce09d4d631160").unwrap();

    let rawnonce = hex::decode("48ccdb552de220538ac1667e7e99054fd39ad417c5d83c6e").unwrap();

    // Generate the key and nonce used for encryption
    let key = Key::from_slice(&rawkey).expect("Invalid key");
    let nonce = Nonce::from_slice(&rawnonce).expect("Invalid nonce");

    let mut input_file = input_file;
    let mut buffer = [0u8; CHUNK_SIZE];
    let mut offset: u64 = 0;

    loop {
        let bytes_read = input_file.read(&mut buffer)?;

        if bytes_read == 0 {
            break;
        }

        // Encrypt the chunk
        let encrypted_chunk = secretbox::seal(&buffer[..bytes_read], &nonce, &key);

        // Seek back to the beginning of the chunk
        input_file.seek(SeekFrom::Start(offset))?;

        // Write the encrypted chunk to the input file
        input_file.write_all(&encrypted_chunk)?;

        // Move the offset to the next chunk
        offset += bytes_read as u64;
    }

    println!("Encryption completed successfully.");
    Ok(())
}

fn decrypt_file(input_file_path: &str) -> io::Result<()> {
    let input_file = OpenOptions::new()
        .read(true)
        .write(true)
        .open(input_file_path)?;

    // Initialize sodiumoxide library
    sodiumoxide::init().expect("Failed to initialize sodiumoxide");

    let rawkey =
        hex::decode("ba72744932db553b55cb944aa8e5739cf23a7f668c32d164c51ce09d4d631160").unwrap();

    let rawnonce = hex::decode("48ccdb552de220538ac1667e7e99054fd39ad417c5d83c6e").unwrap();

    // Generate the key and nonce used for decryption
    let key = Key::from_slice(&rawkey).expect("Invalid key");
    let nonce = Nonce::from_slice(&rawnonce).expect("Invalid nonce");

    let mut input_file = input_file;
    let mut buffer = [0u8; CHUNK_SIZE];
    let mut offset: u64 = 0;

    loop {
        let bytes_read = input_file.read(&mut buffer)?;

        if bytes_read == 0 {
            break;
        }

        // Decrypt the chunk
        let decrypted_chunk = match secretbox::open(&buffer[..bytes_read], &nonce, &key) {
            Ok(decrypted) => decrypted,
            Err(_) => {
                eprintln!("Error: Failed to decrypt chunk.");
                return Ok(());
            }
        };

        // Seek back to the beginning of the chunk
        input_file.seek(SeekFrom::Start(offset))?;

        // Write the decrypted chunk to the input file
        input_file.write_all(&decrypted_chunk)?;

        // Move the offset to the next chunk
        offset += bytes_read as u64;
    }

    println!("Decryption completed successfully.");
    Ok(())
}

I expected it to work for files larger than the chunk buffer size but turns out my implementation of chunking data is bad.

1

There are 1 answers

1
Maarten Bodewes On

I'd use memory mapping the file, then encrypting it in place and storing the nonce / authentication tag at the end. In my opinion using GCM or ChaCha20/Poly1305 are both a bit funky for this kind of purpose. ChaCha20 is somewhat better because it offers a larger nonce & maximum message size, but it may be tricky to make the implementations behave correctly. Meanwhile, there is no problem with using SHA-256 and many processors have SHA-256 acceleration. Try SHA-512 if it is slow on a 64 bit system, you might be surprised.

Example that I've generated using ChatGPT as I'm not a Rust programmer, I did review the code though and had it adjusted multiple times, and the code was generated according to my exacting specs. I mainly kept it like it is to avoid syntactical errors.

use aes::Aes256;
use ctr::cipher::{NewCipher, StreamCipher};
use ctr::Ctr128BE;
use hmac::{Hmac, Mac, NewMac};
use memmap2::MmapOptions;
use rand::{RngCore, rngs::OsRng};
use sha2::Sha256;
use std::fs::{File, OpenOptions};
use std::io::{self, Seek, Write};
use std::path::Path;

const CHUNK_SIZE: usize = 1 * 1024 * 1024; // 1 MiB

/// Encrypts a file in place using AES in CTR mode and appends an HMAC for authentication.
///
/// This function performs in-place encryption of the file specified by `path` using
/// AES encryption in Counter (CTR) mode. After encryption, a nonce generated for the encryption
/// and an HMAC (Hash-based Message Authentication Code) are appended to the file for
/// authentication purposes. The HMAC is calculated over the entire encrypted file content
/// including the nonce, using SHA-256 as the hash function. This approach ensures both
/// confidentiality and integrity of the file data.
///
/// # Parameters
/// - `path`: The path to the file to be encrypted. The file will be modified in place.
/// - `encryption_key`: A 32-byte key used for AES encryption. Must be securely generated and managed.
/// - `hmac_key`: A 32-byte key used for HMAC calculation. Must be different from `encryption_key`
///   and securely generated and managed.
///
/// # Returns
/// - `Ok(())` on success, indicating that the file has been encrypted and the nonce and HMAC
///   have been successfully appended to the file.
/// - `Err(io::Error)` on failure, with an error message indicating the type of error that occurred
///   (e.g., file not found, permission denied, I/O error during processing, etc.).
///
/// # Security Considerations
/// - The `encryption_key` and `hmac_key` must be kept secret and securely managed. Exposure of
///   these keys can compromise the security of the encrypted data.
/// - Ensure that the system's random number generator is secure for nonce generation.
///
/// # Example Usage
/// ```
/// let path = Path::new("path/to/file.txt");
/// let encryption_key = [0u8; 32]; // Use a secure method to generate/store the encryption key
/// let hmac_key = [1u8; 32]; // Use a secure, separate key for HMAC
///
/// if let Err(e) = encrypt_file_aes_ctr_hmac(path, &encryption_key, &hmac_key) {
///     eprintln!("Error encrypting file: {}", e);
/// }
/// ```
///
fn encrypt_file_aes_ctr_hmac(path: &Path, encryption_key: &[u8; 32], hmac_key: &[u8; 32]) -> io::Result<()> {
    let file_len = File::open(path)?.metadata()?.len() as usize;
    let mut file = OpenOptions::new().read(true).write(true).open(path)?;

    // Generate a random nonce
    let mut nonce = [0u8; 16];
    OsRng.fill_bytes(&mut nonce);

    // Initialize HMAC
    let mut hmac = Hmac::<Sha256>::new_from_slice(hmac_key).expect("HMAC can take key of any size");

    // Initialize the encryption cipher in CTR mode
    let mut cipher = Ctr128BE::<Aes256>::new_from_slices(encryption_key, &nonce).unwrap();

    for offset in (0..file_len).step_by(CHUNK_SIZE) {
        let chunk_size = std::cmp::min(CHUNK_SIZE, file_len - offset);

        // Memory-map the chunk
        let mmap_opts = MmapOptions::new().offset(offset as u64).len(chunk_size);
        let mut mmap = unsafe { mmap_opts.map_mut(&file)? };

        // Encrypt the chunk in place
        cipher.apply_keystream(&mut mmap);

        // Update HMAC with the encrypted chunk
        hmac.update(&mmap);
    }

    // Include the nonce in the HMAC calculation after processing all chunks
    hmac.update(&nonce);

    // NOTE: probably not needed, we're already at the end
    file.set_len(file_len as u64 + nonce.len() as u64)?;
    file.seek(io::SeekFrom::End(0))?; 
    
    // Append the nonce to the file
    file.write_all(&nonce)?;

    // Finalize HMAC calculation
    let hmac_result = hmac.finalize().into_bytes();

    // Append HMAC to the file, after the nonce
    file.write_all(&hmac_result)?;

    Ok(())
}

fn main() {
    let path = Path::new("example.txt");
    let encryption_key = [0u8; 32]; // Securely generate/store the encryption key
    let hmac_key = [1u8; 32]; // Use a secure, separate key for HMAC

    if let Err(e) = process_file_in_chunks(path, &encryption_key, &hmac_key) {
        eprintln!("Error processing file: {}", e);
    }
}

Users beware:

  • this code does not keep track on which part of the file is being encrypted - any error or shutdown may result in a partially encrypted file, with little hope of recovery (copy-then-encrypt might work, or possibly keeping a state at the end of the file, writing the nonce first of course)
  • a single key could be used, there is little chance of getting information about the key if it used for both HMAC and AES, it's however best practice to derive or use two separate keys
  • encrypting too many files this way may create a collision in the nonce, the more files that are encrypted this way the higher the chance of this, you could create a key (pair) per file using a key derivation function

Almost forgot, to me the decryption function would be too boring to add, but here it is anyway. Using the constants in the encryption function as well would probably be a good idea.

DO NOT perform operations on the file before the HMAC value has been verified and the nonce & HMAC have been removed! If this function fails for any reason you may need to perform some cleanup!

use aes::Aes256;
use ctr::cipher::{NewCipher, StreamCipher};
use ctr::Ctr128BE;
use hmac::{Hmac, Mac, NewMac};
use memmap2::{MmapMut, MmapOptions};
use sha2::Sha256;
use std::fs::{File, OpenOptions};
use std::io::{self, ErrorKind};
use std::path::Path;

const CHUNK_SIZE: usize = 1 * 1024 * 1024; // Ensure this matches the encryption chunk size
const NONCE_SIZE: usize = 16;
const HMAC_SIZE: usize = 32;

fn decrypt_file_aes_ctr_hmac(path: &Path, encryption_key: &[u8; 32], hmac_key: &[u8; 32]) -> io::Result<()> {
    let file = OpenOptions::new().read(true).write(true).open(path)?;

    // Determine the size of the file
    let file_len = file.metadata()?.len() as usize;
    if file_len < NONCE_SIZE + HMAC_SIZE {
        return Err(io::Error::new(ErrorKind::InvalidData, "File is too short"));
    }

    // Memory-map the file
    let mut mmap = unsafe { MmapOptions::new().map_mut(&file)? };
    let data_len = file_len - NONCE_SIZE - HMAC_SIZE;

    // Extract the nonce and HMAC from the end of the file
    let nonce = &mmap[data_len..data_len + NONCE_SIZE];
    let stored_hmac = &mmap[data_len + NONCE_SIZE..];

    // Initialize the HMAC
    let mut hmac = Hmac::<Sha256>::new_from_slice(hmac_key).expect("HMAC can take key of any size");

    // Initialize the decryption cipher with the nonce
    let mut cipher = Ctr128BE::<Aes256>::new_from_slices(encryption_key, nonce).unwrap();

    // Process the file in chunks for decryption and HMAC update
    for i in (0..data_len).step_by(CHUNK_SIZE) {
        let end = std::cmp::min(i + CHUNK_SIZE, data_len);
        let chunk = &mut mmap[i..end];

        cipher.apply_keystream(chunk);
        hmac.update(chunk); // Update HMAC with the decrypted chunk
    }

    // Finalize HMAC calculation with the nonce and verify
    hmac.update(nonce);
    if hmac.verify(stored_hmac).is_err() {
        return Err(io::Error::new(ErrorKind::InvalidData, "HMAC verification failed"));
    }

    // Truncate the file to remove the nonce and HMAC, restoring the original content length
    drop(mmap); // Important: drop the mmap before truncating the file
    OpenOptions::new().write(true).open(path)?.set_len(data_len as u64)?;

    Ok(())
}