I'm new to hash tables and functions, so I apologize in advance if I got anything wrong.
I'm trying to create a hash table in C++ for a list of about 100k entries comprised of a 7 digit number. The thing is, I got stuck while trying to figure out what hash function to use.
When using %100000 I got ~65k unique keys, while there are ~90k unique entries. Which means that about 1/3 of the data will have collisions.
Is this a good hash function to use? Or is there a better function to use in that case in order to have less collisions? What size should my table be?
Thanks again!
Edit- The entries are numbers between 1 and 2 mil. Is it possible to use the number itself as the key? Or should keys for hash table always start at 0?
The standard library comes with two types internally using a hash table:
std::unordered_mapandstd::unordered_set.As your key type is an integral type you get a hash table pretty conveniently by
std::unordered_map<YourIdType, YourDataType. You can easily access the data viatheMap[someId], but be aware that if the key is not found a new object is created! If that's not desired you'd rather usetheMap.find(someId), which returns an iterator.The drawback, though, is that you store the id twice then (internally as a
std::pair<YourIdType, YourDataType>). You can avoid that by using astd::unordered_set. To be able to do so, though, you need to specialisestd::hashandstd::equal_tofor your type:Alternatively you can provide a custom hasher type (with C++20 that can even be a lambda packed into
decltype) to the set as second template parameter and just implementoperator==for your object type, or provide a custom equality comparer type if you need it to compare differently than the operator, maybe like:Drawback here is – apart from additional complexity for the specialisations – that you cannot lookup objects by the id only, instead you need a complete object of your data type.
So which one is more appropriate/suitable? That depends on your concrete requirements...