User defined match terms for sting distance calculation in R

43 views Asked by At

There are many choices of string distance calculation methods in R in package {stringdist} (https://cran.r-project.org/web/packages/stringdist/stringdist.pdf), very curious about if it is possible to include user defined match items by using regex or some other ways in the Jaro or Jaro-Winker distance calculations? If not, is there any other packages provide this kind of function?

For example: for string "USA Starwar Corporation" (a), "US Starwar Corporation" (b), "United States Starwar Corporation" (c) currently the Jaro distances between ((a),(b)),((b),(c)),((a),(c)) are respectively 0.01449275, 0.2020202, 0.216513. Is there any way to define "USA" matches "US" matches"United States" in the calculation and therefore the distance could be 0,0,0?

Thanks!

0

There are 0 answers