r checking if row elements are within bounds

651 views Asked by At

I am working with a large set of combinations. I would like a way to eliminate a portion of them, such that all combinations where the elements are close together (where close is set by me) are eliminated. (Note, it may take a second or two, it's rather large).

Let's make an example:

library(gtools)

support<-matrix(seq(0,1,by=0.1), ncol=1)
support.n<-as.matrix(permutations(length(support), 3,support))

But now I would like to eliminate all rows where any 2 combinations are "close" (say, within +/-0.2 of each other, inclusive). That is to say, how do I convert:

...
[964,]  1.0  0.7  0.0
[965,]  1.0  0.7  0.1
[966,]  1.0  0.7  0.2
[967,]  1.0  0.7  0.3
[968,]  1.0  0.7  0.4
[969,]  1.0  0.7  0.5
[970,]  1.0  0.7  0.6
[971,]  1.0  0.7  0.8
[972,]  1.0  0.7  0.9
[973,]  1.0  0.8  0.0
[974,]  1.0  0.8  0.1
[975,]  1.0  0.8  0.2
[976,]  1.0  0.8  0.3
[977,]  1.0  0.8  0.4
[978,]  1.0  0.8  0.5
[979,]  1.0  0.8  0.6
[980,]  1.0  0.8  0.7
[981,]  1.0  0.8  0.9
[982,]  1.0  0.9  0.0
[983,]  1.0  0.9  0.1
[984,]  1.0  0.9  0.2
[985,]  1.0  0.9  0.3
[986,]  1.0  0.9  0.4
[987,]  1.0  0.9  0.5
[988,]  1.0  0.9  0.6
[989,]  1.0  0.9  0.7
[990,]  1.0  0.9  0.8

into the much thinner:

....
[964,]  1.0  0.7  0.0
[965,]  1.0  0.7  0.1
[966,]  1.0  0.7  0.2
[967,]  1.0  0.7  0.3
[968,]  1.0  0.7  0.4
[969,]  1.0  0.7  0.5

where the row names are incorrect since I didn't calculate out what happens to them. I have been looking at the commands any(x), and some various which() type commands but can't seem to get the magic to happen.

3

There are 3 answers

1
TheComeOnMan On BEST ANSWER

This should work for you. x is an alias for each row of support.n, each pair combination is created from the row elements and the minimum absolute difference between each pair is checked against 0.2. It returns a TRUE/FALSE which is then used to subset support.n -

support.n[
  apply(
    support.n, 
    1, 
    function(x) 
      min(
        abs(
          diff(
            combn(
              x,
              2
              )
            )
          )
        )
    ) > 0.2,
  ]
0
Neal Fultz On

My answer is also very similiar, gives the same answer. I don't think the combn() stuff is really necessary, instead I would just sort each row instead and take the diff of that:

support.n[ apply(support.n, 1, function(x) all(diff(sort(x)) > .2) ) , ]
0
Chase On

I think codoremifa and I are on the same page here. Our answers use the same logic and provide the same values for your sample data. Here's what I put together:

f <- function(x, threshold = .2) {
 combinations <- combn(length(x),2)
 keep <- all(abs(x[combinations[1,]] - x[combinations[2,]]) > threshold)
 return(keep)
}

Then apply function f, by row, to create an index of TRUE/FALSE values to select the appropriate rows of support.n:

a <- support.n[apply(support.n, 1,f),]

To confirm that our answers were the same, I saved codoremifa's answer as b:

> all.equal(a,b)
[1] TRUE

For large objects, either in terms of the # of rows or the # of columns, you could eliminate the overhead of calculating combn() for each row by doing that ahead of time and simply indexing it in each row. This example however runs in .05 seconds on my machine, so not worth making a more complicated function here.