I'm woking on Java with a large (millions) hashmap that is actually built with a capacity of 10.000.000 and a load factor of .75 and it's used to cache some values
since cached values become useless with time (not accessed anymore) but I can't remove useless ones while on the way I would like to entirely empty the cache when its performance starts to degrade. How can I decide when it's good to do it?
For example, with 10 millions capacity and .75 should I empty it when it reaches 7.5 millions of elements? Because I tried various threshold values but I would like to have an analytic one.
I've already tested the fact that emping it when it's quite full is a boost for perfomance (first 2-3 algorithm iterations after the wipe just fill it back, then it starts running faster than before the wipe)
EDIT: ADDITIONAL INFO
The hashmap has long as keys and float as values. It contains cached correlation of contents, since it's a dot product of tag vectors I wanted to cache them (to increase performance).
So basically what I do is to compute a long
key using the hashcodes of the 2 contents:
static private long computeKey(Object o1, Object o2)
{
int h1 = o1.hashCode();
int h2 = o2.hashCode();
if (h1 < h2)
{
int swap = h1;
h1 = h2;
h2 = swap;
}
return ((long)h1) << 32 | h2;
}
and use it to retrieve stored values. What happens is that since it's a hierarchical clustering contents are merged and their correlation values with other contents are not needed any more.. that's why I want to wipe the hashmap from time to time, to avoid degradation due to useless values inside it.
Using a WeakHashMap
will unpredictably wipe out data also when they are still needed.. I've no control over it.
Thanks
Why not use an LRU Cache? From Java's LinkedHashMap documentation:
So basically, every once in a while as your map gets too big, just delete the first x values that the iterator gives you.
See documentation for
removeEldestEntry
to have this done for you automatically.Here is code that demonstrates: