First of all, this is not a language specific question, the below example uses PHP but it's more about the method (regex?) to find the answer.
Let's say I have an array:
$array = ['The Bert and Ernie game', 'The Bert & Ernie game', 'Bert and Ernie game', 'Bert and Ernie game - english version', 'Bert & Ernie (game)', 'Bert and Ernie - game'] etc...
I want to fetch a combination that shows the most important combinations. So I want to do:
$magicPattern = [something that renders most important occurrences];
preg_match($magicPattern, $array, $matches);
print_r($matches);
As an output I would like to receive something like: "Bert and Ernie game"
PS: I'm not necessary looking for an actual array, a concept to do this would be great too.
UPDATE:
Current code below, any thoughts if this would be a good way of finding the best version of an occurrence? Having a hard time figuring it out from the source of the function.
$array['The Bert and Ernie game'] =0; //lev distance
$array['The Bert & Ernie game'] =0; //lev distance
$array['Bert and Ernie game'] =0; //lev distance
$array['Bert and Ernie game - english version'] =0; //lev distance
$array['Bert & Ernie (game)'] =0; //lev distance
$array['Bert and Ernie - game'] =0; //lev distance
foreach($array as $currentKey => $currentVal){
foreach($array as $matchKey => $matchVal){
$array[$currentKey] += levenshtein($currentKey, $matchKey);
}
}
$array = array_flip($array);
ksort($array);
echo array_values($array)[0]; //Bert and Ernie game
There are many different solutions for solving an issue like this, personally I wouldn't recommend a regex for this. This is typically something that you would solve using a fulltext search index (just google fulltext search for many methods to do this).
For this particular case, assuming you don't have too much data, you could just compute the Levenshtein distance: http://php.net/manual/en/function.levenshtein.php
Or use the
similar_text()
function: http://php.net/manual/en/function.similar-text.php