R: partial sort of matrices by largest elements to index matrix and value matrix?

154 views Asked by At

Suppose a large sparse adjacency matrix. I want two matrices:

  1. 1st value matrix where first column for the largest element, second column for the second largest element and third for the third largest element
  2. 2nd index matrix where first column for the index of the largest element, second column for the index of the second largest element and third for the index of the third largest element

where the partial sort per vector is probably most efficiently done with sort's partial directive, like here https://stackoverflow.com/a/2453619/164148.

Does there exist some ready solutions for the partial sort of matrices by its largest elements to corresponding value/index matrices?

1

There are 1 answers

0
hhh On

max.col works for the maximum position while the order method below works for any number of maximal positions: we order the index matrix to descreasing order and pick the wanted count of largest positions. Similar to max.col, we could use which.max function

> which.max(c(100,1:10))
[1] 1
> max.col(c(100,1:10))
 [1] 1 1 1 1 1 1 1 1 1 1 1

and repeatedly finding the maximal position until not needed. We will next concentrate on the order method. Alternative A shows a case where the value matrix and the index matrix are equal. Alternative B shows a case where the value matrix and the index matrix are not equal.

Alternative A showing a case with index matrix equal to the value matrix.

> set.seed(123); r<-10; c<-10; m2 <- matrix(rbinom(r*c,70,0.5),r,c)
> m2
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]   35   37   41   40   37   34   36   31   35    33
 [2,]   31   36   38   31   33   30   42   38   38    35
 [3,]   39   34   31   43   28   32   37   28   35    32
 [4,]   34   33   37   29   38   37   44   33   42    39
 [5,]   39   41   40   45   34   39   34   33   38    30
 [6,]   32   34   35   30   38   29   33   33   32    31
 [7,]   40   32   34   39   34   28   39   37   35    41
 [8,]   39   37   33   36   27   38   37   33   32    38
 [9,]   35   40   36   39   40   25   31   32   40    38
[10,]   29   41   39   35   34   38   34   34   29    38

> t(apply(m2, 1, function(x) order(x,decreasing=T)[1:3]))
      [,1] [,2] [,3]
 [1,]    3    4    2
 [2,]    7    3    8
 [3,]    4    1    7
 [4,]    7    9   10
 [5,]    4    2    3
 [6,]    5    3    2
 [7,]   10    1    4
 [8,]    1    6   10
 [9,]    2    5    9
[10,]    2    3    6

that is the index matrix and its value matrix, in this case, is the same.

Alternative B showing where the index matrix not equal to the value matrix.

max.col returns the maximum position of each row of a matrix, breaking ties at random, but it does not have the partial directive

require(quanteda) 
mytext <- c("Let the big dogs hunt honor", "No holds barred, it is a honor child child child.", "My child is an honor student")     
myMatrix <-dfm(mytext, ignoredFeatures = stopwords("english"), stem = TRUE) 
> myMatrix %*% t(myMatrix)     #Tells how sentences are related to each other
# 3 x 3 sparse Matrix of class "dgCMatrix"
# text1 text2 text3
# text1     5     1     1
# text2     1    12     4
# text3     1     4     3
> t(myMatrix) %*% myMatrix     #Tells how words are related to each other
# 9 x 9 sparse Matrix of class "dgCMatrix"
# let big dog hunt honor hold bar child student
# let       1   1   1    1     1    .   .     .       .
# big       1   1   1    1     1    .   .     .       .
# dog       1   1   1    1     1    .   .     .       .
# hunt      1   1   1    1     1    .   .     .       .
# honor     1   1   1    1     3    1   1     4       1
# hold      .   .   .    .     1    1   1     3       .
# bar       .   .   .    .     1    1   1     3       .
# child     .   .   .    .     4    3   3    10       1
# student   .   .   .    .     1    .   .     1       1
#
> max.col(t(myMatrix) %*% myMatrix)
[1] 5 2 2 3 8 8 8 8 9
> 
> max.col(myMatrix %*% t(myMatrix))
[1] 1 2 2

and

> n<-length(myMatrix);max.col(t(myMatrix) %*% myMatrix,partial=n-1)
Error in max.col(t(myMatrix) %*% myMatrix, partial = n - 1) : 
   unused argument (partial = n - 1)

where unfortunately partial directive is not defined in the max.col command. So we test the method A:

> mytext <- c("Let the big dogs hunt honor", "No holds barred, it is a honor child child child.", "My child is an honor student")     
> myMatrix <-dfm(mytext, ignoredFeatures = stopwords("english"), stem = TRUE) 
> aa<- myMatrix %*% t(myMatrix)
> aa
3 x 3 sparse Matrix of class "dgCMatrix"
      text1 text2 text3
text1     6     1     1
text2     1    18     5
text3     1     5     6
> t(apply(aa, 1, function(x) order(x,decreasing=T)[1:3]))
      [,1] [,2] [,3]
text1    1    2    3
text2    2    3    1
text3    3    2    1

that is correct and other way around we get

> aaa<- t(myMatrix) %*% myMatrix
> aaa
18 x 18 sparse Matrix of class "dgCMatrix"
   [[ suppressing 18 column names ‘let’, ‘the’, ‘big’ ... ]]

let     1 1 1 1 1 1 . . . . . . .  . . . . .
the     1 1 1 1 1 1 . . . . . . .  . . . . .
big     1 1 1 1 1 1 . . . . . . .  . . . . .
dog     1 1 1 1 1 1 . . . . . . .  . . . . .
hunt    1 1 1 1 1 1 . . . . . . .  . . . . .
honor   1 1 1 1 1 3 1 1 1 1 1 2 1  4 1 1 1 1
no      . . . . . 1 1 1 1 1 1 1 1  3 1 . . .
hold    . . . . . 1 1 1 1 1 1 1 1  3 1 . . .
bar     . . . . . 1 1 1 1 1 1 1 1  3 1 . . .
,       . . . . . 1 1 1 1 1 1 1 1  3 1 . . .
it      . . . . . 1 1 1 1 1 1 1 1  3 1 . . .
is      . . . . . 2 1 1 1 1 1 2 1  4 1 1 1 1
a       . . . . . 1 1 1 1 1 1 1 1  3 1 . . .
child   . . . . . 4 3 3 3 3 3 4 3 10 3 1 1 1
.       . . . . . 1 1 1 1 1 1 1 1  3 1 . . .
my      . . . . . 1 . . . . . 1 .  1 . 1 1 1
an      . . . . . 1 . . . . . 1 .  1 . 1 1 1
student . . . . . 1 . . . . . 1 .  1 . 1 1 1
> mmm <- t(apply(aaa, 1, function(x) order(x,decreasing=T)[1:3]))
        [,1] [,2] [,3]
let        1    2    3
the        1    2    3
big        1    2    3
dog        1    2    3
hunt       1    2    3
honor     14    6   12
no        14    6    7
hold      14    6    7
bar       14    6    7
,         14    6    7
it        14    6    7
is        14    6   12
a         14    6    7
child     14    6   12
.         14    6    7
my         6   12   14
an         6   12   14
student    6   12   14

and the corresponding value matrix

> t(apply(mmm, 1, function(x) colnames(aaa)[x]))
        [,1]    [,2]    [,3]   
let     "let"   "the"   "big"  
the     "let"   "the"   "big"  
big     "let"   "the"   "big"  
dog     "let"   "the"   "big"  
hunt    "let"   "the"   "big"  
honor   "child" "honor" "is"   
no      "child" "honor" "no"   
hold    "child" "honor" "no"   
bar     "child" "honor" "no"   
,       "child" "honor" "no"   
it      "child" "honor" "no"   
is      "child" "honor" "is"   
a       "child" "honor" "no"   
child   "child" "honor" "is"   
.       "child" "honor" "no"   
my      "honor" "is"    "child"
an      "honor" "is"    "child"
student "honor" "is"    "child"

where the value matrix is not equal to the earlier index matrix.