I just found out about the AWS Glacier service and wanted to write a small Python application to upload files via the REST API. I took a look at the required headers and stumbled upon x-amz-sha256-tree-hash
. I need to calculate the SHA-256 hash of the entire file as well as the hash of the parent of all hashes of each 1 MB chunk. This leads to the following tree:
(Image taken from here)
I already made a function that reads 1 MB chunks and a class which calculates their hashes on-the-fly but then I completely struggle:
In my application I made a class called chunk
which takes data and calculates a hash in the __init__
method as well as holds parent and children (like a regular tree).
When the user opens a file those chunks instances will be generated properly with their respective hashes (in this example that would be 7 chunk instances).
Now I have two big problems that are connected to each other:
- How do I build this tree in reverse? I basically need to make a new chunk for each two chunk instances on the lowest layer and calculate a hash based on those two hashes. But where do I store that parent? In the children of the parent and do reverse tree walking?
- How does that work with an odd number of children? If I have an algorithm that goes through each parent layer then I would miss the last (0.5 MB) chunk.
I checked out this topic on SO but that method only works with an even children count which is not always given.
Can you help me finding a way/an algorithm/an approach on how to solve this issue?
Thanks in advance!
Paul
First calculate the number of levels, then
Make sure to handle None as second argument in hashofthem :)