Find circular replacement patterns

138 views Asked by At

Suppose I would like to expand a string by replacing placeholders from a dictionary. The replacement strings can also contain placeholders:

$pattern = "#a# #b#";
$dict = array("a" => "foo", "b" => "bar #c#", "c" => "baz");
while($match_count = preg_match_all('/#([^#])+#/', $pattern, $matches)) {
    for($i=0; $i<$match_count; $i++) {
        $key = $matches[1][$i];
        if(!isset($dict[$key])) { throw new Exception("'$key' not found!"); }
        $pattern = str_replace($matches[0][$i], $dict[$key], $pattern); 
     }   
}
echo $pattern;

This works fine, as long as there are no circular replacement patterns, for example "c" => "#b#". Then the program will be thrown into an endless loop until the memory is exhausted.

Is there an easy way to detect such patterns? I'm looking for a solution where the distance between the replacements can be arbitrarily long, eg. a->b->c->d->f->a
Ideally, the solution would also happen while in the loop and not with a separate analysis.

2

There are 2 answers

0
chiborg On BEST ANSWER

Thanks to the comment by georg and this post, I came up with a solution that converts the patterns into a graph and uses topological sort to check for circular replacement.

Here is my solution:

$dict = array("a" => "foo", "b" => "bar #c#", "c" => "baz #b#");

# Store incoming and outgoing "connections" for each key => pattern replacement
$nodes = array();
foreach($dict as $patternName => $pattern) {
    if (!isset($nodes[$patternName])) {
        $nodes[$patternName] = array("in" => array(), "out" => array());
    }
    $match_count = preg_match_all('/#([^#])+#/', $pattern, $matches);
    for ($i=0; $i<$match_count; $i++) {
        $key = $matches[1][$i];
        if (!isset($dict[$key])) { throw new Exception("'$key' not found!"); }
        if (!isset($nodes[$key])) {
            $nodes[$key] = array("in" => array(), "out" => array());
        }
        $nodes[$key]["in"][]          = $patternName;
        $nodes[$patternName]["out"][] = $key;
     }   
}
# collect leaf nodes (no incoming connections)
$leafNodes = array();
foreach ($nodes as $key => $connections) {
    if (empty($connections["in"])) {
        $leafNodes[] = $key;
    }
}
# Remove leaf nodes until none are left
while (!empty($leafNodes)) {
    $nodeID = array_shift($leafNodes);
    foreach ($nodes[$nodeID]["out"] as $outNode) {
        $nodes[$outNode]['in'] = array_diff($nodes[$outNode]['in'], array($nodeID));
        if (empty($nodes[$outNode]['in'])) {
            $leafNodes[] = $outNode;
        }
    }
    $nodes[$nodeID]['out'] = array();
}
# Check for non-leaf nodes. If any are left, there is a circular pattern
foreach ($nodes as $key => $node) {
    if (!empty($node["in"]) || !empty($node["out"]) ) {
        throw new Exception("Circular replacement pattern for '$key'!");
    }
}

# Now we can safely do replacement 
$pattern = "#a# #b#";
while ($match_count = preg_match_all('/#([^#])+#/', $pattern, $matches)) {
    $key = $matches[1][$i];
    $pattern = str_replace($matches[0][$i], $dict[$key], $pattern); 
}
echo $pattern;
0
willeM_ Van Onsem On

Single character keys

This is quite easy if the keys are single characters: simply check if a string at the value side contains a character that is a key.

foreach ($your_array as $key => $value) {
    foreach(str_split($value) as $ch) {
        if(array_key_exists ($ch,$your_array) {
            #Problem, cycle is possible
        }
    }
}
#We're fine

Now even if there is cycle, that does not mean it is fired on every string (for instance in the empty string, no patterns will be fired thus no cycles). In that case you can incorporate it into your checker: if a rule is fired for the second time, there is a problem. Simply because if this is the case, the previous pattern has generated an occasion for this, thus the ocasion will be generated over and over again.

String keys

In case the keys are strings as well, this is probably the Post Correspondence Problem which is undecidable...