Determining if two rar files are part of the same set

3.3k views Asked by At

Let's say I have two files, (name).n.rar and (name).n+1.rar, which appear to be part of the same set (same size, etc). Is there any easy way to tell if they're actually part of the same set, without first downloading the full set? Currently the only way I can tell is by downloading an instance of every file and and then seeing if WinRAR gives me an error when I try to unwrap them.

(And on a related note, assuming there is such a method, can I do the same without having adjacent parts?)

Ideally there's an existing program that can do this, but I can code my own if necessary.

Further notes: These are two sets of archives of the same file. They appear identical to obvious checks: filenames are subsequent, contents are sane, sizes are identical, same number of parts. I then receive a full set of files. If they're not from the same set, I can't unrar them - though it seems that WinRAR will proceed to 100% before giving me the CRC error (file corrupt.)

2

There are 2 answers

6
Boris Brodski On

I'm not familiar with RAR-format that much, but in case you decide to write your program in Java I can recommend using 7-Zip-JBinding.

You can download first n+1 parts of the archive and then call extract() method ignoring output data only caring for

IArchiveExtractCallback.setOperationResult(ExtractOperationResult) 

calls (checking that CRC was ok) and monitoring files getting opened trough

IArchiveOpenVolumeCallback.getStream(java.lang.String)

If volume n+2 get requested, you can conclude that volume n+1 was the right one. (I'm not 100% sure about this conclusion, but I would give it a try)

8
Vereos On

New Answer

All tests were made using WinRAR 5.01 32-bit. Since the algorythm should remain the same, the following statements should be valid for any other previous version. Feel free to comment if you know that's not true.

I'll give a short briefing about the chat. I tried to pack a file larger than 1GB several times; Then I mixed up the files and tried to extract the archives: it worked. The problem was not the size of the file indeed.

I thought about three possible solutions to the problem:

  1. Architecture was influent in the packaging process: so different people tried to pack the files, and mixing up them would result in an error;
  2. Different people tried to pack the files, giving a slightly different size file (for example 250 MB and 250000 KB). This would have been noticed in the file properties, though;
  3. Files were corrupted during the download: re-downloading them would confirm this hypothesis.

I was most curious about the first one: could architecture be influent in the packaging process?

I found out the answer is yes, it is. Here are the passages to repeat the experiment:

  1. Pack your files in an archive, giving a precise part size, in computer A;
  2. Pack the same exact files, giving the same exact part size, in computer B (TODO: Check if this experiment is still valid with similar architecture, e.g. Intel i7 with Intel i5) with a different architecture (e.g. Intel processor with AMD processor);
  3. Transfer one (or more, if you wish, but of course not all of them!) parts from computer B to computer A. Remember to delete those files from computer A before the transfer;
  4. Place all the files in the same directory, check if they all have the same name (e.g. "AAA part1", "AAA part2"...);
  5. Extract them;
  6. Enjoy your CRC Error!.

Tests were made using an Intel i7-3632QM and an AMD FX 6300.

I have some suspects about the fact that the compressed files are the same, but the CRC code is different.


Old Answer

There is a way indeed. During my Computer Science academic studies, we had a Computer Forensics class. I learned that every file has a static beginning (an header, we could say), that makes a program recognize its type and the way to decrypt it. To see it, you just have to open it with a text editor (Notepad++ is the best so far, I guess)

For example, jpeg images begin with ÿØÿá.

I tried to store a video in some splitted .rar files, and knowing if they are part of the same archive was simpler than I thought.

Every rar file begins with Rar!. On the second or third line, it should appear the name of the file stored in the archive: in my case, myVideo.mp4. If all your archives contain that filename, they're probably part of the same archive.

Things are getting worse if there are several files in the archive and you don't know their names. In fact, if there is more than one file, the RAR files structure is as follows:

File 1:

Rar!
NUL NUL NUL //Random things here
NUL NUL NUL NUL NUL myVideo.mp4 NUL NUL NUL NUL
//Random things here. If the dimensions of the file exceed the archive,
//the next file will begin with the same name.
//Let's assume that this is happening.
EOF

File 2:

Rar!
NUL NUL NUL //Random things here
NUL NUL myVideo.mp4 NUL NUL NUL
//This time the file is complete. Since there is still space in the archive,
//it will add another file
NUL NUL NUL NUL mySecondVideo.mp4 NUL NUL NUL NUL
EOF

Let's assume that at the end of the second archive, mySecondVideo hasn't been fully compressed yet.

File 3:

Rar!
NUL NUL NUL
NUL NUL NUL NUL mySecondVideo.mp4 NUL
NUL NUL NUL
NUL myTextFile.txt
NUL NUL NUL mySecondTextFile.txt NUL
EOF

If mySecondTextFile.txt isn't yet fully compressed, my fourth file will begin with its name.

I hope it's clear, I tried to keep it as simple as possible. In the case of more files, I would start from the last archive. I'd write down the first filename found on that file and I'd search it in the previous one. If I found that name, I'd repeat the sequence until the first archive.