Similar to hash values but something that returns an int

228 views Asked by At

Using the hash function MD5 on a string creates a very long value, and it creates the same value for the same string every time. Now, my question is: is there a way to do something similar, like give it a string and it returns the same integer every time, and also the integers that it returns for different string are inside a specific interval. What i mean is something like this.

Ex: Give it "Mary had a little lamb." and it returns the value 10. Give the same string, it returns 10 again.

Please do ask, in case i wasn't entirely clear.

2

There are 2 answers

5
Alex D On BEST ANSWER

You are describing a "hash function". Look it up on Wikipedia.

MD5 is one kind of hash function. Most MD5 implementations return a string, but that string is just a representation of a (LARGE) integer. You can take an MD5 hash, and then use as many of the low-order bits as you need to get an integer of the desired size. If the desired range is not a power of 2, you will need to do a modulo operation to get it into the desired range.

Also, virtually every modern programming language has a built-in function for hashing strings, which returns an integer. In Java, it's String.hashCode(). In Ruby, it's String#hash.

In this case, the language is Javascript, which (I was shocked to learn) doesn't seem to have something like this built in. This is String.hashCode() from the Java platform (perhaps you can port it to Javascript):

public int hashCode() {
int h = hash;
if (h == 0) {
    int off = offset;
    char val[] = value;
    int len = count;

        for (int i = 0; i < len; i++) {
            h = 31*h + val[off++];
        }
        hash = h;
    }
    return h;
}
0
Wladimir Palant On

You can use the lower bytes of an MD5 hash. You have to consider that JavaScript (at least in Firefox 9) can use something like 48 bits (6 bytes) to store exact integer numbers, the length of an MD5 hash on the other hand is 128 bits (16 bytes). So you will necessarily have more hash collisions than you would normally get with MD5. But still:

function toHashCode(str)
{
  // Convert string to an array of bytes
  var array = Array.prototype.slice.call(str);

  // Create MD5 hash
  var hashEngine = Components.classes["@mozilla.org/security/hash;1"]
                             .createInstance(Components.interfaces.nsICryptoHash);
  hashEngine.init(hashEngine.MD5);
  hashEngine.update(array, array.length);
  var hash = hashEngine.finish(false);

  // Turn the first 6 bytes of the hash into a number
  var result = 0;
  for (var i = 0; i < 6; i++)
    result = result * 256 + hash.charCodeAt(i);
  return result;
}

alert(toHashCode("test"));  // Displays 265892827251497
alert(toHashCode("Mary had a little lamb."));   // Displays 117938552300214