# R: Comparing Text Similarity between Neighbour Strings

I am trying to compare texts in a column to identify the text similarity, in terms of whether adjacent letters in the texts are similar; how many substition is necessary for two adjacent letters to make the both letters same.

Example: JANE-JNAE (1 - AN/NA), MARY-MART(0), CLERA-LCREA(2 - CL/LC & ER/RE)

I have tried stringdist methods but they do not provide solutions for my problem.

Since I am new to R, I could not write an efficent code to show here:

``````substition <- function(text1,tex2){

if(text1 == text2){
return(TRUE)
}

if(nchar(text1) != nchar(text2)){
return(FALSE)
}

vec1 <- strsplit("text1",split="")[[1]]
vec2 <- strsplit("text2",split="")[[1]]

(can't go on)

``````

. But to illustrate:

data is something like this

``````df\$NO  df\$names
1      JANE
2      MARY
3      CLERA
4      JNAE
5      LCREA
6      MART
``````

and the desired output is:

``````df\$NO  df\$names df\$substition
1      JANE     1
2      MARY     0
3      CLERA    2
4      JNAE     1
5      LCREA    2
6      MART     0
``````

On

You can use the Levenshtein distance (https://en.wikipedia.org/wiki/Levenshtein_distance) between strings. The distance gives the minimal number of insertions, deletions and substitutions needed to transform one string into another.

Usage

``````adist(
c("lazy", "lasso", "lassie"),
c("lazy", "lazier", "laser")
)
``````

Returns a 3x3 matrix of distances:

``````##      [,1] [,2] [,3]
## [1,]    0    3    3
## [2,]    3    4    2
## [3,]    4    3    3
``````