I have 2 queries:
query1:你好世界
query2:你好
When i run this code using the python library Levenshtein:
from Levenshtein import distance, hamming, median
lev_edit_dist = distance(query1,query2)
print lev_edit_dist
I get an output of 12. Now the question is how is the value 12 derived?
Because in terms of strokes difference, theres definitely more than 12.
According to its documentation, it supports unicode:
You need to make sure the Chinese characters are in unicode though: