Create torrent hash info

2.1k views Asked by At

how do I generate torrent hash info on torrent files.

I have been taking a look on this example: How to calculate the hash value of a torrent using Java and am trying to convert it to C++. This is the code I have so far:

void At::ReadTorrent::TorrentParser::create_hash(std::string torrentstub)
{
    std::string info;
    int counter = 0;

    while(info.find("4:info") == -1)
    {
        info.push_back(torrentstub[counter]);
        counter++;
    }

    unsigned char array[torrentstub.size()];
    int test = 0;

    for(int data; (data = torrentstub[counter]) > -1;)
    {
         array[test++] = data;
         counter++;
    }
    std::cout << array << std::endl;

    //SHA-1 some value here to generate the hash.
}

The torrentstub parameter is the torrent file represented as a string. As far as I understand I have to get the information that is coming after 4:info. This works okay I think, for example:

d6:lengthi2847431620e4:name8:filename12:piece lengthi1143252e6:pieces50264

After this there is only information that I can't read, I guess this is some binary data?

So my question actually boils down to be: Is the information that should be hashed everything that comes after 4:info, and where should I stop collecting data for the hash?

2

There are 2 answers

0
Not Submitted On BEST ANSWER

The sample code you based this seems to assume the info key is the last thing in the torrent file (it may not be, so read the entire answer to get the whole story). As such, it would cover the remainder of the file (minus 1 byte) starting at the byte following ":info". You would see something like "...:infod6:length...". The SHA1 starts with "d6:length..." and goes to the end of the file minus 1 byte (last byte, usually 'e', is not included).

For example, if the torrent file is 43125 bytes and ":info" starts at offset 362, then the SHA data starts at offset 367 and continues to offset 43123 (that is, it's 42757 bytes).

You may know that your torrent files indeed end with the info key. If you don't know, then your algorithm must be a little more sophisticated. A torrent file is bencoded and the info key consists of a bencode "dictionary" (search for bencode in Wikipedia and read the article-- it's pretty simple to understand). The "d" following the ":info" starts the dictionary which ends with an "e". The length of the dictionary is not encoded, so the only way to know where it ends is to parse the contents until you find the "e" that ends it. If the file is correctly formatted the contents of the dictionary will consist of a series of well-formatted bencoded elements (and further-nested elements). Eventually you will find an "e" following the end of an element (instead of another element). This "e" ends the dictionary. The SHA1 is over the entire contents of this dictionary, including the opening "d" and the closing "e". It is possible for other bencoded elements to follow this. These are NOT included in the SHA1 calculation.

Misc. notes:

Assuming the info key is the last thing in the file (again, it may not be), the single byte that is "left out" of the SHA1 in your algorithm is the final "e" for the entire torrent (which is just a single bencode dictionary-- all torrent files begin with "d" and end with "e").

This is binary data, so you must read it as such when filling torrentstub[].

You cannot test for -1 to determine when to end as you do in your example. The code it is based on looks at the result of the read operation when testing for -1 (eof), not the data itself. You must use the length of the torrent file, minus the start of the data (after ":info") minus 1 to get the right length.

The sample code you reference actually does read the last byte but excludes it when generating the SHA1.

Reading one byte, copying to the string then re-scanning the string repeatedly is very inefficient. You already have the data in an array, so just use strstr (since the beginning is ASCII data) or scan it yourself (not too hard to just code it since it's a very short, fixed-length string).

I assume you have code to do the actual SHA1. What platform are you working on?

3
Cory Nelson On

The .torrent spec is freely available and should help you understand the file format quite easily. All you need to do is SHA1 the contents of the info key to get the info hash.