I'm using hashlib to create a sha256 hash of a file, which is then compared against the previous hash of the file which is stored in a database.
def create_hash(file, id, hex=True, hash_type=hashlib.sha256):
con = db.connect(db_host, db_user, db_pass, db_db)
cur = con.cursor()
hashinst = hash_type()
with open(file, 'rb') as f:
for chunk in iter(lambda: f.read(hashinst.block_size * 128), b''):
hashinst.update(chunk)
hash = hashinst.hexdigest() if hex else hashinst.digest()
print hash
cur.execute("SELECT * FROM Previews WHERE S_Id=%s", (id))
row = cur.fetchone()
count = row[2] + 1
cur_hash = row[1]
if hash == cur_hash:
count = row[2] + 1
cur.execute("UPDATE Previews SET Count = %s WHERE S_Id = %s", (count, id))
con.commit()
elif hash != cur_hash:
cur.execute("UPDATE Previews SET Count = 0 WHERE S_Id = %s", (id))
con.commit()
cur.execute("UPDATE Previews SET Hash = %s WHERE S_Id = %s", (hash, id))
con.commit()
Speed is a must for this so I am also using the multiprocessing module.
pool = multiprocessing.Pool(processes= pCount)
pool.map(create_preview, rows)
This calls a function create_preview
which creates the images and calls the function above. The issue is that all the hashes are the same. If I do this within a for loop instead of using the multiprocessing pool i have no issues and all the hashes are different.
Does anyone know of any issues these may be with using the hashlib module and multiprocessing or an alternate method which I could use to compare the files?
So, you should use
map(lambda row: create_preview(row[0]), rows)
instead ofmap(create_preview, rows)
.