Context: I'm working on an analyzer for useragent strings (Yauaa) and as part of this analysis I want to make an educated guess what brand of the device should be reported. I have an implementation that I need to rewrite to be a lot more efficient.
Because I do not want to have a complete list of all devices I want to do the detection based on the prefix of the model.
So I have a dataset with prefixes and the brand that is associated:
- "GT-" --> "Samsung"
- "LLD-" --> "Huawei"
And then I want to do a .get("GT-1234124") which should result in "Samsung" because that is the "longest matching prefix".
I had a look at the Trie structure but that seems to be for the opposite situation. What I understand is that you start with a set of values and you can efficiently get all the values that starts with the provided prefix.
If I were to implement this from scratch I would use a tree similar to the Trie but walk around it differently. What I'm looking for is a datastructure that does what I need as fast as possible.
What datastructure do you recommend for this usecase?
Is there an existing (proven) implementation I can use?
I did some digging into datastructures and found that essentially the Trie structure is what I need with a different way of walking around the structure.
Since this structure is really simple I created my own implementation that works very well.
See: https://github.com/nielsbasjes/yauaa/blob/master/analyzer/src/main/java/nl/basjes/parse/useragent/utils/PrefixLookup.java
Updates: