http://judy.sourceforge.net/downloads/10minutes.htm
Question -- Why would only 4 cache-line fills be required? Isn't the cache line 64-bytes these days?
With an expanse of 2^32 (or 256^4), a maximum of 4 cache-line fills would be required for a worst-case highly populated 256-ary digital tree access. In an expanse of 2^64 (or 256^8), 8 cache-line fills would be the worst case. In practice, Judy does much better than this. The reason is (in part) due to the fact "density" of the keys is seldom the lowest possible number in a "majority" of the sub-expanses. It takes high density combined with high population to increase the depth of a Judy tree. It would take a long time to explain why. The short version is an analogy with sand. It takes a lot of sand to build a tall sand pile, especially if it takes 256 grains to support 1 grain above it. In a 64-bit Judy, it would probably require more RAM than exists on this planet to get it to have 8 levels. A binary tree reaches 8 levels with a population of 256. It is truly remarkable to me how much research has been done on binary trees and still being taught.
It's a 256-ary tree, so to traverse down to a single item in a densely populated tree with 32-bit keys, you need to descend log256(232) = 4 levels.
To do that in 4 cache line fills, you need each 256-ary internal node, including encoded pointers to its 256 children, to fit into a single cache line. A cache line is only 512 bits, so that's pretty impressive.