"Distance" between data variable in PHP

123 views Asked by At

Is there any way (using libraries if necesary) to normalize any PHP variable (integer, strings, files, bytarray, etc etc), so this data can be measure by distance among them?

With distance I mean, a F("hello") should be close to a F("hell").

However, not only for string, but AMONG nay kind of data.

I thought of passing first everything to binary, but PHP bits managament is not so straight forward. In C++, this can be done much easier.

For example I should be able to calculate distance among f("hello") and f(3333). (differente data types).

Maybe dumping everything to a bytearray?

Thanks

1

There are 1 answers

0
ThomasVdBerge On

The Levenshtein function might be something to look into.

From the php.net page:

<?php
// input misspelled word
$input = 'carrrot';

// array of words to check against
$words  = array('apple','pineapple','banana','orange',
                'radish','carrot','pea','bean','potato');

// no shortest distance found, yet
$shortest = -1;

// loop through words to find the closest
foreach ($words as $word) {

    // calculate the distance between the input word,
    // and the current word
    $lev = levenshtein($input, $word);

    // check for an exact match
    if ($lev == 0) {

        // closest word is this one (exact match)
        $closest = $word;
        $shortest = 0;

        // break out of the loop; we've found an exact match
        break;
    }

    // if this distance is less than the next found shortest
    // distance, OR if a next shortest word has not yet been found
    if ($lev <= $shortest || $shortest < 0) {
        // set the closest match, and shortest distance
        $closest  = $word;
        $shortest = $lev;
    }
}

echo "Input word: $input\n";
if ($shortest == 0) {
    echo "Exact match found: $closest\n";
} else {
    echo "Did you mean: $closest?\n";
}

?>

The above example will output:

Input word: carrrot

Did you mean: carrot?