The below code shows that three files which are on a UNC share hosted on another machine have the same hash. It also shows that local files have different hashes. Why would this be? I feel that there is some UNC consideration that I don't know about.
Python 2.7.5 (default, May 15 2013, 22:44:16) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import hashlib
>>> fn_a = '\\\\some.host.com\\Shares\\folder1\\file_a'
>>> fn_b = '\\\\some.host.com\\Shares\\folder1\\file_b'
>>> fn_c = '\\\\some.host.com\\Shares\\folder2\\file_c'
>>> fn_d = 'E:\\file_d'
>>> fn_e = 'E:\\file_e'
>>> fn_f = 'E:\\folder3\\file_f'
>>> f_a = open(fn_a, 'r')
>>> f_b = open(fn_b, 'r')
>>> f_c = open(fn_c, 'r')
>>> f_d = open(fn_d, 'r')
>>> f_e = open(fn_e, 'r')
>>> f_f = open(fn_f, 'r')
>>> hashlib.md5(f_a.read()).hexdigest()
'54637fdcade4b7fd7cabd45d51ab8311'
>>> hashlib.md5(f_b.read()).hexdigest()
'54637fdcade4b7fd7cabd45d51ab8311'
>>> hashlib.md5(f_c.read()).hexdigest()
'54637fdcade4b7fd7cabd45d51ab8311'
>>> hashlib.md5(f_d.read()).hexdigest()
'd2bf541b1a9d2fc1a985f65590476856'
>>> hashlib.md5(f_e.read()).hexdigest()
'e84be3c598a098f1af9f2a9d6f806ed5'
>>> hashlib.md5(f_f.read()).hexdigest()
'e11f04ed3534cc4784df3875defa0236'
EDIT: To further investigate the problem, I also tested using a file from another host. It appears that changing the host will change the result.
>>> fn_h = '\\\\host\\share\\file'
>>> f_h = open(fn_h, 'r')
>>> hashlib.md5(f_h.read()).hexdigest()
'f23ee2dbbb0040bf2586cfab29a03634'
...but then I tried a different file on the new host, and got a new result!
>>> fn_i = '\\\\host\\share\\different_file'
>>> f_i = open(fn_i, 'r')
>>> hashlib.md5(f_i.read()).hexdigest()
'a8ad771db7af8c96f635bcda8fdce961'
So, now I'm really confused. Could it have something to do with the fact that the original host is a \\host.com
format and the new host is a \\host
format?
I did some additional research based on the comments and answers everyone provided. I decided I needed to study permutations of these two features of the code:
A raw string literal is used for the path name, i.e. whether or not:
A. The file path string is raw with single backslashes in the path, vs.
B. The file path string is not raw with double backslashes in the path
(FYI to those who don't know, a raw string is one which is proceeded by an "r" like this:
r'This is a raw string'
)The
open
function mode isr
orrb
.(FYI again to those who don't know, the
b
inrb
mode indicates to read the file as binary.)The results demonstrated:
rb
mode inopen
, I got different results.Yay! And thanks for the help.