Unable to create a torrent's info hash

638 views Asked by At

I'm having trouble finding the issue with how I'm generating the corresponding info hash for a torrent file. This is the code I have so far:

InputStream input = null;
try {
    MessageDigest sha1 = MessageDigest.getInstance("SHA-1");
    input = new FileInputStream(file);
    StringBuilder builder = new StringBuilder();
    while (!builder.toString().endsWith("4:info")) {
       builder.append((char) input.read()); // It's ASCII anyway.
    }
    ByteArrayOutputStream output = new ByteArrayOutputStream();
    for (int data; (data = input.read()) > -1; output.write(data));
    sha1.update(output.toByteArray(), 0, output.size() - 1);
    this.infoHash = sha1.digest();
    System.out.println(new String(Hex.encodeHex(infoHash)));
} catch (NoSuchAlgorithmException | IOException e) {
     e.printStackTrace();
} finally {
    if (input != null) try { input.close(); } catch (IOException ignore) {}
}

Below is my expected and actual hash:

Expected: d4d44272ee5f5bf887a9c85ad09ae957bc55f89d
Actual: 4d753474429d817b80ff9e0c441ca660ec5d2450

The torrent I'm trying to generate an info hash for can be found here (Ubuntu 14.04 Desktop amd64).

Let me know if I can provide any more info, thanks!

2

There are 2 answers

3
rzwitserloot On BEST ANSWER

Exceptions contain 4 useful bits of info: Type, Message, Trace, and Cause. You've tossing away 3 out of the 4 relevant bits of info. Also, code is part of a process, and when an error occurs, generally that process cannot be finished at all. And yet on exceptions your process continues. Stop doing this; you've written code that only hurts you. Remove the try, and the catch. Add a throws clause on your method signature. If you can't, the go-to default (and update your IDE if that generated this code to do this) is throw new RuntimeException("Unhandled", e);. This is shorter, does not destroy any of the 4 interesting bits of info, and ends a process.

Separately, the notion that the right way to handle an inputstream close method's IOException being: Just ignore it, is also false. It is highly unlikely to throw, but if it does, you should assume you didn't read every byte. As that would be one explanation for a mismatched hash, it's misguided.

Finally, use the proper language constructs: There is a try-with-resources statement that would work far better here.

You're calling update with output.size() - 1; unless you want to intentionally ignore the last byte, this is a mistake; you're lopping off the last byte read.

Reading bytes into a builder, and then per byte converting the builder to a string and then checking the last character is incredibly inefficient; for a file as small as 1MB that'll cause quite a grind.

Reading a single byte at a time from a raw FileInputStream is also that level of inefficient, because every read will cause file access (reading 1 byte is as expensive as reading a whole buffer full, so, it's about 50000 times slower than it needs to be).

Here's how to do this with somewhat newer API, and look how much nicer this code reads. It also acts better under erroneous conditions:

byte[] data = Files.readAllBytes(Paths.get(fileName));
var search = "4:info".getBytes(StandardCharsets.US_ASCII);
int searchIdx = -1;
for (int i = 0; searchIdx == -1 && i < data.length - search.length; i++) {
    for (int j = 0; j < search.length; j++) {
        if (data[i + j] != search[j]) break;
        if (j == search.length - 1) searchIdx = i + j;
    }
}
if (searchIdx == -1) throw new IOException("Input torrent file does not contain marker");

var sha1 = MessageDigest.getInstance("SHA-1");
sha1.update(data, searchIdx, data.length - searchIdx);
byte[] hash = sha1.digest();
StringBuilder hex = new StringBuilder();
for (byte h : hash) hex.append(String.format("%02x", h));
System.out.println(hex);
0
the8472 On

While rzwitserloot's answer covers some general java coding practices there also are correctness issues on the bittorrent level.

You are using string processing for a structured data format, this is pretty much the same mistake as attempting to parse html with regex. In this case you're assuming that the only place that the data can contain the string 4:info is the top-level dictionary key for the info dict and that the info dictionary is the last entry of the top level dictionary.

Instead you should use a proper bencoding decoder-encoder to extract the info dict and then re-encode it for hashing or a tokenizer to find the exact byte-range covering the info value. Note that you need a validating parser for the former while the latter can also handle some out-of-spec edge cases. Unless you want to implement them yourself you may want to find a library that handles this for you.

Additionally you're assuming that the data is ASCII. bencoding is in fact a binary format that just tends to use ascii by convention in some places. You should operate on byte arrays directly. Your input is already binary, the hasher expects binary so it is quite circuitous to go through strings.