I want to create a Hash Map (or another structure, if you have any suggestions) to store key value pairs. The keys will all be inserted at once at the same time as the map is created, but I don't know what the keys will be (arbitrary length strings) until runtime, when I need to create the map.
I am parsing a query string like this "x=100&name=bob&color=red&y=150"
(but the string can have an unlimited number of variables and the variables can have any length name).
I want to parse it once and create a Hash Map, preferably minimal and with a perfect hash function to satisfy linear storage requirements. Once the map is created the values won't be modified or deleted, no more key value pairs will be added to the map either, so the entire map is effectively a constant. I'm assuming that a variable doesn't occur twice in the string (IE. "x=1&x=2"
is not valid).
I am coding in C
, and currently have a function that I can use like get("x")
which will return the string "100"
, but it parses the query string each time which takes O(n)
time. I'd like to parse it once when it is first loaded since it is a very large query string and every value will be read several times. Even though I'm using C
, I don't need code in C
as an answer. Pseudocode, or any suggestions at all would be awesome!
Try GPL'd gperf, or Bob Jenkins' public domain implementation in C
Procedure:
receive query string and identify domain of perfect hash function by enumerating the list of keys
provide these keys and list size (the range will be 1..size) to the perfect hash generation function derived from above reference implementations
Use the perfect hash function generated to create the HashMap
Use the same perfect hash function to process the
get
requests in the HashMapEdit Necrolis noted in the comment below that the reference implementations output perfect hash functions in C source code, so you'll need to modify them to generate something like a bytecode for a VM instead. You could also use an interpretative language like embedded Scheme or Lua.
It would be interesting to know if this is worth the effort over a simple (non-perfect) HashMap when the overhead of creating the perfect hash function is amortized over the lookups
Another option is Cuckoo hashing which also has O(1) lookups