R: sparse matrix multiplication with data.table and quanteda package?

558 views Asked by At

I am trying to create a matrix mulptiplication with sparse matrix and with the package called quanteda, utilising data.table package, related to this thread here. So

require(quanteda) 

mytext <- c("Let the big dogs hunt", "No holds barred", "My child is an honor student")     
myMatrix <-dfm(mytext, ignoredFeatures = stopwords("english"), stem = TRUE) #a data.table
as.matrix(myMatrix) %*% transpose(as.matrix(myMatrix))

how can you get the matrix multiplication working here with quanteda package and sparse matrices?

2

There are 2 answers

0
Ken Benoit On BEST ANSWER

This works just fine:

mytext <- c("Let the big dogs hunt", 
            "No holds barred", 
            "My child is an honor student")     
myMatrix <- dfm(mytext)

myMatrix %*% t(myMatrix)
## 3 x 3 sparse Matrix of class "dgCMatrix"
##       text1 text2 text3
## text1     5     .     .
## text2     .     3     .
## text3     .     .     6

No need to coerce to a dense matrix using as.matrix(). Note that it is no longer a "dfmSparse" object because it's no longer a matrix of documents by features.

5
hhh On

Use t command, not transpose command, for the matrix multiplication such that

as.matrix(myMatrix) %*% t(as.matrix(myMatrix))

also as commented, as.matrix is non-sparse while Matrix::matrix is sparse but unnecessary here, so better

myMatrix %*% t(myMatrix)

and potentially even better

crossprod(myMatrix) 
tcrossprod(myMatrix) 

but it requires numeric/complex matrix/vector arguments, not working with the example in the question:

require(quanteda)  
mytext <- c("Let the big dogs hunt", "No holds barred", "My child is an honor student")      
myMatrix <-dfm(mytext, ignoredFeatures = stopwords("english"), stem = TRUE) 
crossprod(myMatrix) 
tcrossprod(myMatrix)