I have input keys that can cheaply be converted to a uint64_t
(ie, the input contains less than or equal to 64 bits).
Each unique key (not yet in the map) will be assigned some data (a pointer to an object). Much like a std::map<uint64_t, Object*>
thus.
Inserting a new key is not time critical because there will only be a small number of total keys, which are never removed again. Where small is less than 100.
A typical implementation would use a std::vector
(because of the small number of elements) and either just scan over all elements, or use a binary search (ie boost::flat_map
); but this is not optimal enough for me. Once all elements have been inserted the map is static and it should be possible to find the pointer that belongs to a given key in a mere few clock cycles.
What I am thinking of is determining a perfect hash every time a new key is inserted and then use this hash (function) to find back the pointers.
This requires two things:
An algorithm to find a cheap hash function that converts a smallish list of given 64-bit values to 8, 9, 10 (or how many is required) bits with no collisions (aka, a perfect hash function), so that the latter can be used directly in a lookup table (that is 256, 512, 1024... in size).
A method to dynamically create such a hash function; where I have to admit that I am not willing to run an external compiler and load the new hash function dynamically ;-).
PART 3
I finished my implementation of
UltraHash
. The result is better than I had expected possible - and indeed much much better than the "multiplication hash" approach.UltraHash
object with the keys, doesn't take 0.3 seconds - no.. it takes 1 to 3 ms!The amount of code involved is too much to paste here (I think? ~800 lines). A lot of those lines are comments though. I did explain into excruciating detail what the code is doing and how the algorithm works. If anyone is interested and wants it to be included here, let me know and I'll put some more time it doing that.
The main idea is as follows:
The
n
keys are 64 bits each - we can see the whole as a matrix of ofn x 64
. Each of the 64 columns is examined for randomness (extremely rudimentary) after which they are sorted in order of likeliness to contribute to an evenly distributed set of keys of 32 to 64 keys per set. Then by brute force a combination of columns is picked, resulting in partitioning of the keys into said sets where for each set a perfect hash is attempted to be found. Upon failure the number of columns is increased.Finding the perfect hash for a set of 32 to 64 keys is done by means of linear algebra over GF(2), which involves Gaussian elimination etc.
You can find the code here and the accompanied header here.
Since the code is part of my
utils
git submodule, it uses other files from that library, as well as debugging code from cwds, but it should be easy to make it work standalone. The project ai-utils-testsuite is a configurable and compile-able project that contains benchmark and test code.Below follows the comments that precede one of the functions in
UltraHash.cxx
:And trust me, that's the longest comment I ever wrote for one function :).
ORIGINAL POST:
The random multiplication factor approach
I tried the same approach of samgak: my hash function was just a multiplication of the
uint64_t
key with auint64_t
multiplication factor, and then I tried random values for the multiplication factor until the least number of high-bits of the result were different. It turns out that this takes easily up till 0.3 seconds to reach a 9 bit output.Here I used the following function to find the number of high bits required:
The
hashes
here are, for example, just the most significant 32-bit of the result of the multiplication; the lower 32 bit would be shifted out / ignored anyway.Considering that 100 values, being less than 128, theoretically fit into 7 bits (although I never even found 8 bits with the above approach; which isn't weird when you realize that the chance for a random attempt corresponds with the Birthday Problem with 100 people and 256 birthdays; which has a 1 in 5807421181 chance of having no collisions) and that I found that waiting up to 0.3 seconds, and theoretically much longer, was a bit annoying -- I started to think about a way to calculate the hash function.
In order to be able to do any calculations, I decided to use a linear algebra approach.
Using linear algebra
The idea is to use linear algebra (matrices, vectors). And since we are working with arbitrary bits, the most logical thing to do is to work over . Ok, this external conversion of LaTeX to images isn't working, I'll use UTF8: ℤ₂
Let all the input keys be represented by 64-dimensional column vectors over ℤ₂: ᵢ.
Note there can exist at most 64 such linearly independent vectors. Assume for a moment that we have 64 such keys. Then we can construct the matrix
and calculate the inverse of that (because of ᵢ are linearly independent). Lets denote this inverse with K⁻¹.
Further more, note that we need 6 bits to enumerate 64 input keys. Now let the 6×64 matrix L be the matrix that exist of all 6-dimensional column vectors:
Let the 6×64 matrix M then be
Then
and left-multiplying any input key ᵢ with M will give a unique 6-bit column vector; aka - the perfect hash.
Linearly dependent keys
The main problem with this approach is of course that the input keys might not be linearly independent, which is certainly going to be the case when there are more than 64 keys.
In order to detect if/when keys are not linearly independent, we can simply triangulate the matrix K when new keys come in. In order to give examples, lets assume that we have vectors/keys of 4 bits instead of 64.
Consider the following five distinct input keys:
Then we start with a identity matrix (all ones on the main diagonal and the reset zeroes), and put the first key such that the top 1 aligns with a 1 on the matrix diagonal:
As a result, this matrix has an inverse, because the upper-left triangle is all zeroes and the main diagonal is all ones.
Next key, V₁, can likewise as-is be put in this matrix:
Now column four is used, so we can't use it to put in V₂ (that also has a 1 at the top). Therefore, we subtract the fourth column from V₂ first (note that subtraction over ℤ₂ is addition modulo 2, aka an XOR operation). V₂ then becomes:
and we can put that in column 2 of the matrix (which doesn't change it). V₃ now no longer can be put in column 2, because it is "used".
If we try to use the same trick, subtracting column 2 from V₃ we get
all zeroes. Which indicates that the first four keys were linearly dependent. We can mitigate this by extending the vector with a 1 at the bottom (and all the others with zeroes):
If we would repeat the same thing with these inputs, we get the exact same matrix, but with an extra 1 in the bottom left:
V₄ turns out to be linear dependent as well: it has a 1 at the top, thus subtract the last column, and then we get the (now) fourth column which was already used. Therefore we have to add another row:
and
Now lets construct K as described above (just put all the V's in it):
I left the fourth column empty because we could only find three linearly independent keys even though the (original) keys are 4 bits. Aka, if the keys were 64 bits and we could only find 62 that were linearly independent, we would leave 2 columns empty. That is ... we'll fill them in with pseudo keys that ARE linearly independent with the previous ones; that will lead those columns having their lower bits zero:
Moreover, the missing linearly independent keys are trivially the unused column(s) of K', so that K can be completed by extending the diagonal of ones from the bottom right to the top left:
Because now all columns are linearly independent, this matrix has an inverse, in this case that is
Originally we had 5 input keys, but even with the extra pseudo key we can enumerate that with 3 bits. However, lets take into account we're not really interested in what the pseudo key results in - so we use the following L:
where you can fill in whatever you want in column four. If you fill in the next number (5):
then
Constructing a Kₒ with the original keys (V₀, V₁, V₂, V₃ and V₄):
and using M without the last two columns,
we get
showing that multiplying with this matrix M is not a perfect hash: the last column is the same as the second one.
The exact change is obvious: if we had used the full matrix (including the last two columns) and the extended keys then only the last key has a 1 in the last column and is therefore XOR-ed with the last column of the matrix:
Thus, the only way to correct the output is by knowing in which column of K it belongs (the last column) based on the first four bits of that column, which is much like the original problem we try to solve to begin with.
PART 2
The trick is to divide the keys into sets of 64 or less linearly independent keys. This is easy because for fully random 64 bit values, on average the first 62.18 keys will be linearly independent, which is close enough to 64 to be efficient.
The keys should be well hashed however, for example if the 64-bit keys would be
uint64_t
with values between 0 till 64, only using 6 bits, then you'd obviously only find 6 linearly independent keys at a time and would at least need64 / 6 = 11
sets.This would slow down the look up because you will have to be able to determine in which set a key belongs before you can multiply it with the correct matrix.
One way to do this is to sort the keys and remember the values of the first key in the next set (the first key that is linear dependent on the keys in the previous set).
For example, lets say we have three sets:
because
V₃
can be expressed in terms of the keys inS₀
(e.g.V₀ + V₂
) andV₅
could be expressed asV₃ + V₄
.Lets try to think of an example where this is the case. Remember that we sorted the keys, so
V₀<V₁<V₂<V₃<V₄<V₅<V₆<V₇<V₈
when represented as numbers. Lets say that the least significant bit of the unsigned integers that they are stored in correspond with row 0 of the vectors, then possible values could be:or
so that indeed
V₃=V₀+V₂
andV₅=V₃+V₄
.But we can do better! There is no need to sort the keys and use
<
to specify the boundaries between sets. A very effective way would be to use a binary tree to find the correct set where each time one particular bit is used.Note that with an expectation of ~100 keys and sets with a size of ~62 - we expect to have only two sets! So only a single bit to switch set on. This bit can be found by trial and error. In the above example we can create the following two sets of linearly independent vectors:
by simply switching on the least significant bit.
The total lookup code for one hundred 64-bit keys will then exist of testing 1 bit, that determines which table of six 64-bit values to use, and then six XOR operations (that can be done in parallel in theory; so that should be around 3 clock cycles) and likewise 6 parity tests - which I'll count as 6 clock cycles (see this post for more on parity). Which brings me to a dozen clock cycles or so. And the result would be 7 bit, not 9 bit! Also, calculating these matrices can be done much much faster than 0.3 seconds.
I'll post an update when I have working code.