Calculate edit distance percentage

Question

Calculate edit distance percentage

302 views Asked by El David At 19 December 2016 at 18:53

I am attempting to get a percentage of an edit distance from a group of sequences. So far this is what I have:

library(stringdist)

sequence <- c("CA--------W----------------------EKDRRTEAF---F------",
   "CA--------W----------------------EKDRRTEAF---F------", 
   "CA--------S-------------------SLVFGQGDNIQY---F------", 
   "RA--------S-------------------SLIYSP----LH---F------")

edit_dist <- stringdistmatrix(sequence)
#0 
#13 13 
#14 14 11

len <- stri_length(gsub('-', '', sequence))
#13    13    16    12

As each line of len is equivalent to each line of sequence, when comparing the two lines I would like to use the largest len to get the percentage. So as when having an edit distance between the second and third sequence it would use the length of 16 rather than 13 to get a percentage.

I know this code is wrong, but it is generally the idea I am going for:

for (i in len) {
  num1 <- len[i]
  for (j in len){
    num2 <- len[j] 
    if (num2 > num1){
        num <- num2
        }else{
          num <- num1
        }
    }
    edit_dist/num
}

The answer should look something similar to that below:

0
.8125  .8125
1.0769  1.0769  .6875

Original Q&A

There are 1 answers

**alistaire** · Accepted Answer · 2016-12-19T19:42:01+00:00

You can construct a suitable matrix of the max length with outer and pmax, which you can then coerce to dist class (like edit_dist) so you can divide:

edit_dist <- stringdistmatrix(sequence)
n <- nchar(gsub('-', '', sequence))

edit_dist / as.dist(outer(n, n, pmax))
##          1        2        3
## 2 0.000000                  
## 3 0.812500 0.812500         
## 4 1.076923 1.076923 0.687500

TechQA.

Calculate edit distance percentage

There are 1 answers

Related Questions in R

Related Questions in BIOINFORMATICS

Related Questions in STRING-COMPARISON

Related Questions in EDIT-DISTANCE

Popular Questions

Popular Tags

Trending Questions