Reverse tree building (with an odd number of children)

1.2k views Asked by At

I just found out about the AWS Glacier service and wanted to write a small Python application to upload files via the REST API. I took a look at the required headers and stumbled upon x-amz-sha256-tree-hash. I need to calculate the SHA-256 hash of the entire file as well as the hash of the parent of all hashes of each 1 MB chunk. This leads to the following tree:

AWS's SHA-256 Tree Hash procedure

(Image taken from here)

I already made a function that reads 1 MB chunks and a class which calculates their hashes on-the-fly but then I completely struggle:

In my application I made a class called chunk which takes data and calculates a hash in the __init__ method as well as holds parent and children (like a regular tree). When the user opens a file those chunks instances will be generated properly with their respective hashes (in this example that would be 7 chunk instances).

Now I have two big problems that are connected to each other:

  1. How do I build this tree in reverse? I basically need to make a new chunk for each two chunk instances on the lowest layer and calculate a hash based on those two hashes. But where do I store that parent? In the children of the parent and do reverse tree walking?
  2. How does that work with an odd number of children? If I have an algorithm that goes through each parent layer then I would miss the last (0.5 MB) chunk.

I checked out this topic on SO but that method only works with an even children count which is not always given.

Can you help me finding a way/an algorithm/an approach on how to solve this issue?

Thanks in advance!

Paul

1

There are 1 answers

3
unddoch On BEST ANSWER

First calculate the number of levels, then

def proclevel(levels):
    if levels > 0:
        generator = proclevel(levels - 1)
        temp = None
        for firsthash, secondhash in generator:
            if not temp: temp = hashofthem(firsthash, secondhash)
            else: yield temp, hashofthem(firsthash, secondhash); temp = None
        #If odd number of packets
        if temp: yield temp, None
    else:
        temp = None
        for chunk in chunks:
            if not temp: temp = hash(chunk)
            else: yield temp, hash(chunk); temp = None
        if temp: yield temp, None

Make sure to handle None as second argument in hashofthem :)