Difference Between KMP and Regex/DFA-based Searching

2.6k views Asked by At

I am confused about the relation between KMP (Knuth–Morris–Pratt) and Regex (DFA-based) Searching.

My thought is that KMP cannot use regex notations (e.g., (A|B){2}C), so it can only search for a "single" string (e.g., AC or BC, but not AC|BC). Is this true?

Another question, if the pattern is a single string (ABABAC), are they essentially using the same?

3

There are 3 answers

0
I. Cantrell On

It seems(95% sure) both algorithms should do exactly the same thing since step of moving from position i in the string to back to the end of a prefix at position p will be the same as a non-deterministic automaton being in both the states, the one that is right after the prefix, p, and the one further into the string at position i. Once converted into dfa this automaton will have one state that will simulate the NFA and it will finish in linear time. So that regex with kleene star is equivalent to KMP.

1
Micromega On

In fact there is a generalized form of KMP that is a FA (aho-corasick algorithm). It's also easy to use a wildcard. IMO you can uses regular expression with kmp but it's not so easy.

2
Bergi On

KMP cannot use regex notations, so it can only search for a "single" string. Is this true?

Yes. KMP is a string search algorithm, not a pattern matching algorithm.

Another question, if the pattern is a single string (ABABAC), are they essentially using the same?

No, DFA-based matching is not equivalent to the KMP algorithm. It is however possible that advanced regex match implementations employ KMP as an optimisation.