Based on this paper: IEEE TRANSACTIONS ON PAITERN ANALYSIS : Computation of Normalized Edit Distance and Applications In this paper Normalized Edit Distance as followed:
Given two strings X and Y over a finite alphabet, the normalized edit distance between X and Y, d( X , Y ) is defined as the minimum of W( P ) / L ( P )w, here P is an editing path between X and Y , W ( P ) is the sum of the weights of the elementary edit operations of P, and L(P) is the number of these operations (length of P).
Can i safely translate the normalized edit distance algorithm explained above as this:
normalized edit distance =
levenshtein(query 1, query 2)/max(length(query 1), length(query 2))
You are probably misunderstanding the metric. There are two issues:
The normalization step is to divide
W(P)
which is the weight of the edit procedure overL(P)
, which is the length of the edit procedure, not over the max length of the strings as you did;Also, the paper showed that (Example 3.1) normalized edit distance cannot be simply computed with levenshtein distance. You probably need to implement their algorithm.
An explanation of Example 3.1 (c):
From
aaab
toabbb
, the paper used the following transformations:a
witha
;a
in the first string;a
in the first string;b
in the second string;b
in the second string;b
s.These are 6 operations which is why
L(P)
is 6; from the matrix in (a), matching has cost 0, skipping has cost 2, thus we have total cost of0 + 2 + 2 + 2 + 2 + 0 = 8
, which is exactlyW(P)
, andW(P) / L(P) = 1.33
. Similar results can be obtained for (b), which I'll left to you as exercise :-)