My implementation in Python calculates the merkle root hash for ~1500 input hashes:
import numpy as np
from binascii import unhexlify, hexlify
from hashlib import sha256
txids = np.loadtxt("txids.txt", dtype=str)
def double_sha256(a, b):
inp = unhexlify(a)[::-1] + unhexlify(b)[::-1]
sha1 = sha256(inp).digest()
sha2 = sha256(sha1).digest()
return hexlify(sha2[::-1])
def calculate_merkle_root(inp_list):
if len(inp_list) == 1:
return inp_list[0]
out_list = []
for i in range(0, len(inp_list)-1, 2):
out_list.append(double_sha256(inp_list[i], inp_list[i+1]))
if len(inp_list) % 2 == 1:
out_list.append(double_sha256(inp_list[-1], inp_list[-1]))
return calculate_merkle_root(out_list)
for i in range(1000):
merkle_root_hash = calculate_merkle_root(txids)
print(merkle_root_hash)
Since the merkle root is calculated 1000 times, it takes ~5ms for one calculation:
$ time python3 test.py
b'289792577c66cd75f5b1f961e50bd8ce6f36adfc4c087dc1584f573df49bd32e'
real 0m5.132s
user 0m5.501s
sys 0m0.133s
How could I improve the speed of the calculation? Can this code be optimized?
So far, I have tried to unroll the recursive function in Python and C++. However, the performance did not increase, it took ~6ms.
EDIT
The file is available here: txids.txt
EDIT 2
Due to the suggestion in a comment, I removed the unnecessary steps of unhexlify
and hexlify
. Before the loop the list is prepared once.
def double_sha256(a, b):
inp = a + b
sha1 = sha256(inp).digest()
sha2 = sha256(sha1).digest()
return sha2
def map_func(t):
return unhexlify(t)[::-1]
txids = list(map(map_func, txids))
for i in range(1000):
merkle_root_hash = calculate_merkle_root(txids)
merkle_root_hash = hexlify(merkle_root_hash[::-1])
Now the execution is ~4ms:
$ time python3 test2.py
b'289792577c66cd75f5b1f961e50bd8ce6f36adfc4c087dc1584f573df49bd32e'
real 0m3.697s
user 0m4.069s
sys 0m0.128s
In the last update (2 may 2021 at 17:00), the calls to
sha256(value).digest()
takes roughly 80% of the time on my machine. There are few possible solution to fix that.The first is to parallelize the computation using
multiprocessing
assuming the work is independent for each iteration. Here is an example:This is 4 times faster on my 6-core machine.
Another solution is to find a faster Python module that can compute multiple sha256 hashes at the same time to reduce the expensive C calls from the CPython interpreter. I am not aware of any package doing this.
Finally, one efficient solution is to (at least partially) rewrite the expensive
calculate_merkle_root
computation in C or C++ and run it in parallel. This should be significantly faster than your current code as this removes the function call overhead and the multiprocessing cost. There are many libraries to compute a sha256 hash (like the Crypto++ library).