I am implementing a system that stores and manipulates a lot of repetitive short strings. For example stock price series. I will have a lot of repetitive entries of Microsoft stock prices:
<time1>,MSFT,60.01
<time2>,MSFT,60.02
<time3>,MSFT,60.00
I am thinking of using Boost::Flyweight
to optimize the memory allocation, string lookup/comparison/copying cost of those small repetitive ticker names (like MSFT in this case).
But the thing is those strings are pretty small to begin with -- usually just a few bytes. While a long type is 8 bytes already in modern computers. Is it worth it to use Boost::Flyweight
in this case?
My understanding of Boost::Flyweight
is that it internalized strings are integers to improve performance. But I think lookup/comparison/copying a 8-byte string wouldn't be dramatically different from operating on a 8-byte long datatype. So is it worth the hassel of moving to Boost::Flyweight
?
My main goal is more on the speed optimization side as opposed to memory optimization side, if I have to choose one.
Flyweight is very generic and configurable.
I'd suggest using a backing of strings allocated from a single, fixed-size pool of memory (e.g.
std::vector<CharType>
). You would then only need to returnstd::string_view
s to range of bytes in the backing storage.You can use FlyWeight to configure things like so, but I'd need to find time to demo it.
Alternatively, you could "roll your own". I have some samples of that on StackOverflow:
My experience with Flyweight has varied (https://stackoverflow.com/search?tab=votes&q=user%3a85371%20flyweight, e.g. boost multi_index_container and slow operator++). It seems that naive implementation of Flyweight is rarely what you want.
UPDATE Just remembered this related demo I made using Perfect Hashing for NASDAQ ticker symbols: Is it possible to map string to int faster than using hashmap?