As we learned, given a regex pattern (e.g., A B A B A C
), we can convert it to a DFA. In this example, it would be like a chain (you can test it here).
This "chain-like" DFA can tell if a given string match the pattern or not (i.e., accept/reject it); But it cannot tell if there is any occurrence within the string and identify all of them.
Example:
Suppose this is the to-search string: A B C A B A B A B A C A B C
Although there is an occurrence starting from the 6th character, the "chain-like" DFA cannot tell this. All it can do is to reject this string.
Question: Is it possible to design a regex that support such functionality?
(Note: I understand this question is kind of confusing; I would like to clarify it confuses you.)
The language of strings containing the substring
ABABAC
is matched by the regular expression:Where the symbol
.
denotes a subexpression that matches any single input symbol (e.g.(A|B|C)
, if the input language only has the symbolsA
,B
andC
). To tell if a string has the substringABABAC
, you can build an NFA or a DFA from this regular expression, and check if it accepts your string.Determining the location of the substring in the input string is not possible with a (single) standard N/DFA, simply because an N/DFA is defined to only return one bit of information (accept/reject). However, it is possible to implement an "augmented N/DFA" that, in addition to matching the input, also keeps track of where in the string each state transition last occurred; this information is enough to efficiently reconstruct the location of the substring.