File with random data but specific size

2.4k views Asked by At

I am trying to generate a file in ruby that has a specific size. The content doesn't matter.

Here is what I got so far (and it works!):

File.open("done/#{NAME}.txt", 'w') do |f|
  contents = "x" * (1024*1024)
  SIZE.to_i.times { f.write(contents) }
end

The problem is: Once I zip or rar this file the created archive is only a few kb small. I guess thats because the random data in the file got compressed.

How do I create data that is more random as if it were just a normal file (for example a movie file)? To be specific: How to create a file with random data that keeps its size when archived?

2

There are 2 answers

7
Neil Slater On BEST ANSWER

You cannot guarantee an exact file size when compressing. However, as you suggest in the question, completely random data does not compress.

You can generate a random String using most random number generators. Even simple ones are capable of making hard-to-compress data, but you would have to write your own string-creation code. Luckily for you, Ruby comes with a built-in library that already has a convenient byte-generating method, and you can use it in a variation of your code:

require 'securerandom'
one_megabyte = 2 ** 20 # or 1024 * 1024, if you prefer

# Note use 'wb' mode to prevent problems with character encoding
File.open("done/#{NAME}.txt", 'wb') do |f|
  SIZE.to_i.times { f.write( SecureRandom.random_bytes( one_megabyte ) ) }
end

This file is not going to compress much, if at all. Many compressors will detect that and just store the file as-is (making a .zip or .rar file slightly larger than the original).

1
Cary Swoveland On

For a given string size N and compression method c (e.g., from the rubyzip, libarchive or seven_zip_ruby gems), you want to find a string str such that:

str.size == c(str).size == N

I'm doubtful that you can be assured of finding such a string, but here's a way that should come close:

  • Step 0: Select a number m such that m > N.

  • Step 1: Generate a random string s with m characters.

  • Step 2: Compute str = c(str). If str.size <= N, increase m and repeat Step 1; else go to Step 3.

  • Step 3: Return str[0,N].