I have a boost::multi_index_container
whose elements are structs like this:
struct Elem {
A a;
B b;
C c;
};
The main key (in a database sense) is a composite_key
of a
and b
. Other
keys exist to perform various types of queries.
I now need to retrieve a set of all different values of c
. These values are
by all means not unique, but iterating through all entries (albeit ordered),
or using std::unique
seems quite a waste, considering that
the number of different values of c
is expected to be << than the total
number of entries (say, 10 to 1000).
Am I missing a simple way to obtain this result more efficiently?
I scoured the Boost.MultiIndex documentation and can't seem to find a way to do what you want. I'm interested in knowing if it's doable.
Perhaps the best you can do is maintain a
std::map<C, size_t>
(or hash map) alongside yourmulti_index_container
and keep them both "synchronized".The map associates a C value with its occurrence count (frequency). It's essentially a histogram of C values. Each time you add an
Elem
to yourmulti_index_container
, you increment the corresponding frequency in the histogram. When you remove anElem
from yourmulti_index_counter
, you decrement the corresponding frequency in the histogram. When the frequency reaches zero, you delete that entry from the histogram.To retrieve the set of distinct C values, you simply iterate through the
<key,value>
pairs in the histogram and look at thekey
part of each pair. If you used astd::map
, then the distinct C values will come out sorted.If you're going to examine the set of distinct C values only once (or rarely) then the approach I described above may be overkill. A simpler approach would be to insert all C values into a
std::set<C>
and then iterate through the set to retrieve the distinct C values.You said that the set of distinct C's is much smaller then the total number of C's. The
std::set<C>
approach should therefore waste much less space than copying the C's to astd::vector
, sorting the vector, then runningstd::unique
.Let's compare the time complexity of copying to a set versus copying to a vector, sorting, then running
unique
. Let N be the total number of C values, and let M be the number of distinct C values. The set approach, by my reckoning, should have a time complexity of O(N*log(M)). Since M is small and does not grow much with higher N's, the complexity effectively becomes O(N). The sorting+unique technique, on the other hand, should have a time complexity of O(N*log(N)).