CRC16 collision (2 CRC values of blocks of different size)

1.6k views Asked by At

The Problem

I have a textfile which contains one string per line (linebreak \r\n). This file is secured using CRC16 in two different ways.

  1. CRC16 of blocks of 4096 bytes
  2. CRC16 of blocks of 32768 bytes

Now I have to modify any of these 4096 byte blocks, so it (the block)

  • contains a specific string
  • does not change the size of the textfile
  • has the same CRC value as the original block (same for the 32k block, that contains this 4k block)

Depart of that limitations I may do any modifications to the block that are required to fullfill it as long as the file itself does not break its format. I think it is the best to use any of the completly filled 4k blocks, not the last block, that could be really short.

The Question

How should I start to solve that problem? The first thing I would come up is some kind of bruteforce but wouldn't it take extremly long to find the changes that will result in both CRC values stay the same? Is there probably a mathematical way to solve that?

It should be done in seconds or max. few minutes.

2

There are 2 answers

3
usr On BEST ANSWER

There are math ways to solve this but I don't know them. I'm proposing a brute-force solution:

A block looks like this:

SSSSSSSMMMMEEEEEEE

Each character represents a byte. S = start bytes, M = bytes you can modify, E = end bytes.

After every byte added to the CRC it has a new internal state. You can reuse the checksum state up to that position that you modify. You only need to recalculate the checksum for the modified bytes and all following bytes. So calculate the CRC for the S-part only once.

You don't need to recompute the following bytes either. You just need to check whether the CRC state is the same or different after the modification you made. If it is the same, the entire block will also be the same. If it is different, the entire block is likely to be different (not guaranteed, but you should abort the trial). So you compute the CRC of just the S+M' part (M' being the modified bytes). If it equals the state of CRC(S+M) you won.

That way you have much less data to go through and a recent desktop or server can do the 2^32 trials required in a few minutes. Use parallelism.

1
Mark Adler On

Take a look at spoof.c. That will directly solve your problem for the CRC of the 4K block. However you will need to modify the code to solve the problem simultaneously for both the CRC of the 4K block and the CRC of the enclosing 32K block. It is simply a matter of adding more equations to solve. The code is extremely fast, running in O(log(n)) time, where n is the length of the message.

The basic idea is that you will need to solve 32 linear equations over GF(2) in 32 or more unknowns, where each unknown is a bit location that you are permitting to be changed. It is important to provide more than 32 unknowns with which to solve the problem, since if you pick exactly 32, it is not at all unlikely that you will end up with a singular matrix and no solution. The spoof code will automatically find non-singular choices of 32 unknown bit locations out of the > 32 that you provide.