Hexdigest in Python

3.1k views Asked by At

How the Hexdigest md5 values of two same CSV files are different when I check it. The difference between the two CSV files is that one is tab seperated and the other one is comma seperated, else the values are same.

f1 = open(r'D:\Temporary\New File.csv',mode='r')
f2 = open(r'D:\Temporary\Old File.csv',mode='r')
print hashlib.md5(t1).hexdigest(),'    ',hashlib.md5(t2).hexdigest()
if hashlib.md5(t1).hexdigest()== hashlib.md5(t2).hexdigest():
    print "Match"
else:
    print "Not Match"

The output shows :

a4b2720cafdcb859e7ef07a7a3564ba3      237a5c28b890f94636035482a363853a
Not Match

On the other hand, this code gives correct output, where I introduced read() function and then took the md5 digest. Now the keys match.

f1 = open(r'D:\Temporary\New File.csv',mode='r')
f2 = open(r'D:\Temporary\Old File.csv',mode='r')
print f1.read()
print f2.read()
print hashlib.md5(t1).hexdigest(),'    ',hashlib.md5(t2).hexdigest()
if hashlib.md5(t1).hexdigest()== hashlib.md5(t2).hexdigest():
    print "Match"
else:
    print "Not Match"

Now, the output is:

Ultimator Start Code Start Count    
Ultimator,Start Code,Start Count,,,,
d41d8cd98f00b204e9800998ecf8427e      d41d8cd98f00b204e9800998ecf8427e
Match
1

There are 1 answers

1
tynn On BEST ANSWER

MD5 is a cryptographic hash function, which works on the raw data of a file. If two CSV files have the same contents (by your consideration) just using different delimiters, the raw data differs. That's why the MD5 hexdigest values must differ too.

When you call file.read before, the position pointer of this file will be at the end of the file and calling file.read again after returns '':

>>> hashlib.md5('').hexdigest()
'd41d8cd98f00b204e9800998ecf8427e'