What datastructure is the fastest to find the best matching prefix?

Question

What datastructure is the fastest to find the best matching prefix?

875 views Asked by Niels Basjes At 19 November 2018 at 09:20

Context: I'm working on an analyzer for useragent strings (Yauaa) and as part of this analysis I want to make an educated guess what brand of the device should be reported. I have an implementation that I need to rewrite to be a lot more efficient.

Because I do not want to have a complete list of all devices I want to do the detection based on the prefix of the model.

So I have a dataset with prefixes and the brand that is associated:

"GT-" --> "Samsung"
"LLD-" --> "Huawei"

And then I want to do a .get("GT-1234124") which should result in "Samsung" because that is the "longest matching prefix".

I had a look at the Trie structure but that seems to be for the opposite situation. What I understand is that you start with a set of values and you can efficiently get all the values that starts with the provided prefix.

If I were to implement this from scratch I would use a tree similar to the Trie but walk around it differently. What I'm looking for is a datastructure that does what I need as fast as possible.

What datastructure do you recommend for this usecase?

Is there an existing (proven) implementation I can use?

Original Q&A

There are 1 answers

**Niels Basjes** · Accepted Answer · 2018-11-21T16:16:17+00:00

I did some digging into datastructures and found that essentially the Trie structure is what I need with a different way of walking around the structure.

Since this structure is really simple I created my own implementation that works very well.

See: https://github.com/nielsbasjes/yauaa/blob/master/analyzer/src/main/java/nl/basjes/parse/useragent/utils/PrefixLookup.java

Updates:

I wrote an article about this https://techlab.bol.com/finding-the-longest-matching-string-prefix-fast/
I put my implementation into a separate library which I opensourced and which is already available via maven central. See https://github.com/nielsbasjes/prefixmap

TechQA.

What datastructure is the fastest to find the best matching prefix?

There are 1 answers

Related Questions in DATA-STRUCTURES

Related Questions in BINARY-SEARCH-TREE

Related Questions in PREFIX

Related Questions in PREFIX-TREE

Popular Questions

Popular Tags

Trending Questions