Asked  7 Months ago    Answers:  5   Viewed   38 times

Where can I find algorithms that values the spelling of misplaced characters more accurately than levenshtein() and php similar_text() methods?

Example:

similar_text('jonas', 'xxjon', $similar); echo $similar; // returns 60
similar_text('jonas', 'asjon', $similar); echo $similar; // returns 60 <- although more similar!
echo levenshtein('jonas', 'xxjon'); // returns 4
echo levenshtein('jonas', 'asjon'); // returns 4  <- although more similar!

/ Jonas

 Answers

22

Here's a solution that I've come up to. It's based on Tim's suggestion of comparing the order of subsequent charachters. Some results:

  • jonas / jonax : 0.8
  • jonas / sjona : 0.68
  • jonas / sjonas : 0.66
  • jonas / asjon : 0.52
  • jonas / xxjon : 0.36

I'm sure i isn't perfect, and that it could be optimized, but nevertheless it seems to produce the results that I'm after... One weak spot is that when strings have different length, it produces different result when the values are swapped...

static public function string_compare($str_a, $str_b) 
{
    $length = strlen($str_a);
    $length_b = strlen($str_b);

    $i = 0;
    $segmentcount = 0;
    $segmentsinfo = array();
    $segment = '';
    while ($i < $length) 
    {
        $char = substr($str_a, $i, 1);
        if (strpos($str_b, $char) !== FALSE) 
        {               
            $segment = $segment.$char;
            if (strpos($str_b, $segment) !== FALSE) 
            {
                $segmentpos_a = $i - strlen($segment) + 1;
                $segmentpos_b = strpos($str_b, $segment);
                $positiondiff = abs($segmentpos_a - $segmentpos_b);
                $posfactor = ($length - $positiondiff) / $length_b; // <-- ?
                $lengthfactor = strlen($segment)/$length;
                $segmentsinfo[$segmentcount] = array( 'segment' => $segment, 'score' => ($posfactor * $lengthfactor));
            } 
            else 
            {
                $segment = '';
                $i--;
                $segmentcount++;
            } 
        } 
        else 
        {
            $segment = '';
            $segmentcount++;
        }
        $i++;
    }   

    // PHP 5.3 lambda in array_map      
    $totalscore = array_sum(array_map(function($v) { return $v['score'];  }, $segmentsinfo));
    return $totalscore;     
}
Wednesday, March 31, 2021
 
McAn
answered 7 Months ago
67
function substr_with_ellipsis($string, $chars = 100)
{
    preg_match('/^.{0,' . $chars. '}(?:.*?)b/iu', $string, $matches);
    $new_string = $matches[0];
    return ($new_string === $string) ? $string : $new_string . '&hellip;';
}
Wednesday, March 31, 2021
 
mattltm
answered 7 Months ago
86

You should point to your vendor/autoload.php at Settings | PHP | PHPUnit when using PHPUnit via Composer.

This blog post has all the details (with pictures) to successfully configure IDE for such scenario: http://confluence.jetbrains.com/display/PhpStorm/PHPUnit+Installation+via+Composer+in+PhpStorm

Related usability ticket: http://youtrack.jetbrains.com/issue/WI-18388

P.S. The WI-18388 ticket is already fixed in v8.0

Wednesday, March 31, 2021
 
ojrac
answered 7 Months ago
79

On Mac OS X environment variables available in Terminal and for the normal applications can be different, check the related question for the solution how to make them similar.

Note that this solution will not work on Mountain Lion (10.8).

Saturday, May 29, 2021
 
Nate
answered 5 Months ago
11

Are you interested in reducing the time complexity or the space complexity ? The average time complexity can be reduced O(n + d^2), where n is the length of the longer string and d is the edit distance. If you are only interested in the edit distance and not interested in reconstructing the edit sequence, you only need to keep the last two rows of the matrix in memory, so that will be order(n).

If you can afford to approximate, there are poly-logarithmic approximations.

For the O(n +d^2) algorithm look for Ukkonen's optimization or its enhancement Enhanced Ukkonen. The best approximation that I know of is this one by Andoni, Krauthgamer, Onak

Saturday, June 12, 2021
 
OMGKurtNilsen
answered 5 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :