Nested for loops and nested if...else statements very slow

151 views Asked by At

I am fairly new with R and I wrote a piece of code to iterate over a data frame (cell by cell) to perform a 'simple' check by if...else statements but it turns out it is very slow.

here the code

for(m in 1:(ncol(df) - 1)){
    listmatch <- unlist(subset(dt, Experiment == m & Number == m)[[which(colnames(dt) == "id")]])
    for(n in 1:nrow(df)){
      if (is.na(df[n, m]) == TRUE){
        df[n, m] <- NA
      } else {
        if (sum(as.numeric(unlist(strsplit(as.character(df[n, 33]), split = ";"))) %in% listmatch, na.rm = TRUE) > 0){
          df[n, m] <- 1
        } else {
          df[n, m] <- NA
      }}
    next}
  next}

Just to give some dimensions, df is 80000 obs of 33 variable, listmatch is on average a vector of 5000 values, while the column 33 of df contains information such as

1;2;3;4;5;6;7;8;9;10;11;12;13...

on average 150 values separated by semicolumn. I do not understand why it is very slow, neither how to optimize it and make it faster. Is it more the nested for loops or the nested if...else statements that make it slow? Any suggestions?

Here an example of my data df

        [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]     [,33]
 [1,]    NA    NA    NA    NA    NA     1    NA     1;8;9;6
 [2,]     1    NA    NA    NA    NA    NA    NA           5
 [3,]     1    NA    NA     1    NA    NA    NA           6
 [4,]     1    NA     1    NA     1     1     1   88;98;125
 [5,]    NA    NA    NA    NA    NA    NA    NA         2;3
 [6,]     1    NA    NA    NA    NA    NA    NA         4;5
 [7,]    NA    NA    NA    NA    NA    NA     1          44
 [8,]    NA    NA     1     1     1    NA    NA          46
 [9,]    NA     1     1    NA    NA    NA    NA          78
[10,]    11    NA    NA    NA    NA    NA    NA     147;411
[11,]    NA     1    NA    NA    NA    NA    NA         400
[12,]    NA     1     1    NA    NA    NA    NA         658
[13,]    NA     1    NA     1    NA    NA    NA 87;988;1226
[14,]    NA    NA    NA    NA     1     1    NA 121;232;321
[15,]    NA    NA     1    NA    NA     1    NA           5

while dt looks like this

           Text Experiment Number    id
 [1,]        df          5      3  1
 [2,]        fd          8      8  8
 [3,]       reh          7      4  6
 [4,]   egfdgsd          4      8  8
 [5,]  fbdsgryj          5      3  44
 [6,] rtryyukfn          7      2  411
 [7,]    fgsfde          5      6  400
 [8,] wthrthmgh          2      7  1274
 [9,]      vdfd          2      8  658
[10,]       vfr          8      6  147
[11,]     brtey          3      2  1226
[12,]     tyirt          7      5  988
[13,]       fyi          5      4  87
[14,]     ywrtj          7      5  46
[15,]      kyou          8      6  4
[16,]      pkjh          1      7  8
[17,]    retuyk          5      4  9
[18,]        re          7      8  11
[19,]     eryre          6      3  12
[20,]      fdhd          6      1  18

I forgot to mention that I would like to keep the code

listmatch <- unlist(subset(dt, Experiment == m & Number == m)[[which(colnames(dt) == "id")]])

as it is, at the moment I am filtering based on the value of m but I would like in the future to apply manually different filtering, i.e. ...Experiment == m*2 & Number == m*5)...

0

There are 0 answers