How to design a regex used for searching a pattern, rather than validating a pattern?

221 views Asked by At

As we learned, given a regex pattern (e.g., A B A B A C), we can convert it to a DFA. In this example, it would be like a chain (you can test it here).

This "chain-like" DFA can tell if a given string match the pattern or not (i.e., accept/reject it); But it cannot tell if there is any occurrence within the string and identify all of them.

Example: Suppose this is the to-search string: A B C A B A B A B A C A B C

Although there is an occurrence starting from the 6th character, the "chain-like" DFA cannot tell this. All it can do is to reject this string.

Question: Is it possible to design a regex that support such functionality?

(Note: I understand this question is kind of confusing; I would like to clarify it confuses you.)

1

There are 1 answers

2
Ilmari Karonen On

The language of strings containing the substring ABABAC is matched by the regular expression:

.*ABABAC.*

Where the symbol . denotes a subexpression that matches any single input symbol (e.g. (A|B|C), if the input language only has the symbols A, B and C). To tell if a string has the substring ABABAC, you can build an NFA or a DFA from this regular expression, and check if it accepts your string.

Determining the location of the substring in the input string is not possible with a (single) standard N/DFA, simply because an N/DFA is defined to only return one bit of information (accept/reject). However, it is possible to implement an "augmented N/DFA" that, in addition to matching the input, also keeps track of where in the string each state transition last occurred; this information is enough to efficiently reconstruct the location of the substring.