Boost flyweight for short strings

1.6k views Asked by At

I am implementing a system that stores and manipulates a lot of repetitive short strings. For example stock price series. I will have a lot of repetitive entries of Microsoft stock prices:

<time1>,MSFT,60.01
<time2>,MSFT,60.02
<time3>,MSFT,60.00

I am thinking of using Boost::Flyweight to optimize the memory allocation, string lookup/comparison/copying cost of those small repetitive ticker names (like MSFT in this case).

But the thing is those strings are pretty small to begin with -- usually just a few bytes. While a long type is 8 bytes already in modern computers. Is it worth it to use Boost::Flyweight in this case?

My understanding of Boost::Flyweight is that it internalized strings are integers to improve performance. But I think lookup/comparison/copying a 8-byte string wouldn't be dramatically different from operating on a 8-byte long datatype. So is it worth the hassel of moving to Boost::Flyweight ?

My main goal is more on the speed optimization side as opposed to memory optimization side, if I have to choose one.

1

There are 1 answers

1
sehe On BEST ANSWER

Flyweight is very generic and configurable.

I'd suggest using a backing of strings allocated from a single, fixed-size pool of memory (e.g. std::vector<CharType>). You would then only need to return std::string_views to range of bytes in the backing storage.

You can use FlyWeight to configure things like so, but I'd need to find time to demo it.

Alternatively, you could "roll your own". I have some samples of that on StackOverflow:

My experience with Flyweight has varied (https://stackoverflow.com/search?tab=votes&q=user%3a85371%20flyweight, e.g. boost multi_index_container and slow operator++). It seems that naive implementation of Flyweight is rarely what you want.

UPDATE Just remembered this related demo I made using Perfect Hashing for NASDAQ ticker symbols: Is it possible to map string to int faster than using hashmap?