How to get most important occurrences from an array?

Question

How to get most important occurrences from an array?

38 views Asked by Bob van Luijt At 17 June 2015 at 09:34

First of all, this is not a language specific question, the below example uses PHP but it's more about the method (regex?) to find the answer.

Let's say I have an array:

$array = ['The Bert and Ernie game', 'The Bert & Ernie game', 'Bert and Ernie game', 'Bert and Ernie game - english version', 'Bert & Ernie (game)', 'Bert and Ernie - game'] etc...

I want to fetch a combination that shows the most important combinations. So I want to do:

$magicPattern = [something that renders most important occurrences];
preg_match($magicPattern, $array, $matches);
print_r($matches);

As an output I would like to receive something like: "Bert and Ernie game"

PS: I'm not necessary looking for an actual array, a concept to do this would be great too.

UPDATE:
Current code below, any thoughts if this would be a good way of finding the best version of an occurrence? Having a hard time figuring it out from the source of the function.

$array['The Bert and Ernie game']               =0; //lev distance
$array['The Bert & Ernie game']                 =0; //lev distance
$array['Bert and Ernie game']                   =0; //lev distance
$array['Bert and Ernie game - english version'] =0; //lev distance
$array['Bert & Ernie (game)']                   =0; //lev distance
$array['Bert and Ernie - game']                 =0; //lev distance

foreach($array as $currentKey => $currentVal){
    foreach($array as $matchKey => $matchVal){
        $array[$currentKey] += levenshtein($currentKey, $matchKey);
    }
}

$array = array_flip($array);
ksort($array);

echo array_values($array)[0]; //Bert and Ernie game

Original Q&A

There are 2 answers

Ali On 17 June 2015 at 09:43

You need something that will look at each value and compute a numerical weight, then sort the array according to the weight and take the top most item.

The weight is your "importance", so you can, for example, choose to assign higher weights to terms you consider more important.

**Wolph** · Accepted Answer · 2015-06-17T09:44:11+00:00

There are many different solutions for solving an issue like this, personally I wouldn't recommend a regex for this. This is typically something that you would solve using a fulltext search index (just google fulltext search for many methods to do this).

For this particular case, assuming you don't have too much data, you could just compute the Levenshtein distance: http://php.net/manual/en/function.levenshtein.php

Or use the similar_text() function: http://php.net/manual/en/function.similar-text.php

TechQA.

How to get most important occurrences from an array?

There are 2 answers

Related Questions in REGEX

Related Questions in LEVENSHTEIN-DISTANCE

Popular Questions

Popular Tags

Trending Questions