Will md5(file_contents_as_string) equal md5_file(/path/to/file)?

15.2k views Asked by At

If I do:

<?php echo md5(file_get_contents("/path/to/file")) ?>

...will this always produce the same hash as:

<?php echo md5_file("/path/to/file") ?>

4

There are 4 answers

0
prehfeldt On BEST ANSWER

Yes they return the same:

var_dump(md5(file_get_contents(__FILE__)));
var_dump(md5_file(__FILE__));

which returns this in my case:

string(32) "4d2aec3ae83694513cb9bde0617deeea"
string(32) "4d2aec3ae83694513cb9bde0617deeea"

Edit: Take a look at the source code of both functions: https://github.com/php/php-src/blob/master/ext/standard/md5.c (Line 47 & 76). They both use the same functions to generate the hash except that the md5_file() function opens the file first.

2nd Edit: Basically the md5_file() function generates the hash based on the file contents, not on the file meta data like the filename. This is the same way md5sum on Linux systems work. See this example:

pr@testumgebung:~# echo foobar > foo.txt
pr@testumgebung:~# md5sum foo.txt
14758f1afd44c09b7992073ccf00b43d  foo.txt
pr@testumgebung:~# mv foo.txt bar.txt
pr@testumgebung:~# md5sum bar.txt
14758f1afd44c09b7992073ccf00b43d  bar.txt
1
Mycelin On

based on the file contents, not on the file metadata like the BOM or filename

That's not correct about BOM. BOM is a part of file content, you can see its three bytes in any non-unicode file editor.

5
Pier-Alexandre Bouchard On

md5_file command just hashs the content of a file with md5.

If you refer to the old md5_file PHP implementation (but the principle is still the same) source :

function php_compat_md5_file($filename, $raw_output = false)
{
// ...
// removed protections

 if ($fsize = @filesize($filename)) {
        $data = fread($fh, $fsize);
    } else {
        $data = '';
        while (!feof($fh)) {
            $data .= fread($fh, 8192);
        }
    }

    fclose($fh);

    // Return
    $data = md5($data);
    if ($raw_output === true) {
        $data = pack('H*', $data);
    }

    return $data;
}

So if you hash with md5 any string or content, you will always get the same result as md5_file (for the same encoding and file content).

In that case, if you hash by md5 the content of a file with file_get_content() or if you use md5_file or even if you use md5 command with the same content as your file content, you will always get the same result.

By example, you could change the file name of a file, and for two different files, with the same content, they will produce the same md5 hash.

By example: Considering two files containing "stackoverflow" (without the quotes) named 1.txt and 2.txt

md5_file("1.txt");
md5_file("2.txt");

would output

73868cb1848a216984dca1b6b0ee37bc

You will have the exact same result if you md5("stackoverflow") or if you md5(file_get_contents("1.txt")) or md5(file_get_contents("1.txt")).

0
Vishal On

Yes, I tried it for several times. In my case, result for:

<?php echo md5(file_get_contents("1.php")) ?>
<br/>
<?php echo md5_file("1.php") ?>

Produce output as:

660d4e394937c10cd1c16a98f44457c2
660d4e394937c10cd1c16a98f44457c2 

Which seems equivalent on both lines.