Force opening and reading zip files from php

538 views Asked by At

This may be a simple question or a pretty complex one, ill let you be the deciders.

Using PHP To open a zip file, extract the files to a directory and close the zip file is not a complicated class to make.

But lets say that the file is not a zip, but yet is able to be read by WinRar, examples of these files are like exe's SFX archives etc.

What factors do all these files have in conmen to allow WinRar to browse the source of them.

Another example is Anti Virus Software, that individually scan files within an EXE ?

So what an example:

$handle = fopen("an_unknown_file.abc", "rb");
while (!feof($handle))
{
    //What generic code could I use to determain weather the file can be extracted ?
}
fclose($handle);

Regards.

2

There are 2 answers

4
Marc B On BEST ANSWER

Zip's specifications allow the actual "zip" file portion to be embedded ANYWHERE within a file. It doesn't necessarily have to start at position '0' in the file. This is how self-extracting zips work. It's a small .exe stub program which has a larger .zip file appended to the end of it.

Finding a zip is mostly a matter of scanning for a zip file's "magic number" within a file, then doing a few heuristics to determine if it's really a zip file, or just something random that happens to contain a zip's magic number.

A .docx file is really just a .zip that contains various XML files representing a Word file's contents. Just like a .jar is a zip file that contains various different chunks of Java code.

Winrar's got a bunch of extra code within it to scan through a file and look for any identifiable "this is a compress archive" type signatures, one of which happens to be that of a zip file's.

There's nothing too magical about it. It's just a matter of scanning through a file and looking for signatures.

4
Nikoloff On

Not sure what exactly is your question, but I think you are confusing something here... File extension can be described as just a convenient way for humans and computers to relate file extensions to the type of the file/programs that work with them. WinRar (or any other program) reads what the file contains and if it can understand it - it works with it. The only important thing is that the file format (data in the file) is valid and that the program you are using can work with this file format.

So, if a file is in any format that WinRar can work with (.rar, .zip, .gz, etc.), it's extension could be .txt or .whatever and WinRar will still be able to work with it. Extension is just for convenience.