Computing the null space of a bigmatrix in R

1.7k views Asked by At

I can not find any function or package to calculate the null space or (QR decomposition) of a bigmatrix (from library(bigmemory)) in R. For example:

library(bigmemory)

a <- big.matrix(1000000, 1000, type='double', init=0)

I tried the following but got the errors shown. How can I find the null space of a bigmemory object?

a.qr <- Matrix::qr(a)
# Error in as.vector(data) : 
#   no method for coercing this S4 class to a vector
q.null <- MASS::Null(a)
# Error in as.vector(data) : 
#   no method for coercing this S4 class to a vector
2

There are 2 answers

6
F. Privé On BEST ANSWER

If you want to compute the full SVD of the matrix, you can use package bigstatsr to perform computations by block. A FBM stands for a Filebacked Big Matrix and is an object similar to a filebacked big.matrix object of package bigmemory.

library(bigstatsr)
options(bigstatsr.block.sizeGB = 0.5)

# Initialize FBM with random numbers
a <- FBM(1e6, 1e3)
big_apply(a, a.FUN = function(X, ind) {
  X[, ind] <- rnorm(nrow(X) * length(ind))
  NULL
}, a.combine = 'c')

# Compute t(a) * a
K <- big_crossprodSelf(a, big_scale(center = FALSE, scale = FALSE))

# Get v and d where a = u * d * t(v) the SVD of a
eig <- eigen(K[])
v <- eig$vectors
d <- sqrt(eig$values)

# Get u if you need it. It will be of the same size of u
# so that I store it as a FBM.
u <- FBM(nrow(a), ncol(a))
big_apply(u, a.FUN = function(X, ind, a, v, d) {
  X[ind, ] <- sweep(a[ind, ] %*% v, 2, d, "/")
  NULL
}, a.combine = 'c', block.size = 50e3, ind = rows_along(u),
a = a, v = v, d = d)

# Verification
ind <- sample(nrow(a), 1000)
all.equal(a[ind, ], tcrossprod(sweep(u[ind, ], 2, d, "*"), v))

This takes approximately 10 minutes on my computer.

4
Technophobe01 On

@Mahon @user20650 @F.Privė For clarity I pinged the bigmemory team and asked

Essentially, is there an implementation of the QR function (QR Decomposition) that works with big memory matrixes?

I felt it useful to get clarity on the original question asked. @F.Privė - nice answer. Hopefully your answer, and their response will help guide people in the future. Their response below:

Thanks for the note. There is not currently an implementation of the qr decomposition. Ideally, you would implement this using Householder reflections (if the matrix is dense) or Givens rotations (if it is sparse).

The irlba package is compatible with bigmemory. It provides a truncated singular value decomposition. So, if your matrix is relatively sparse, you could truncate at the rank of the matrix. This is probably your best option. If you don't know the rank then you can use the package to update the truncation iteratively.

Please note that if your matrix is (tall and skinny or short and fat) then the SO solution is OK. However, anytime you resort to calculating the cross-product you lose some numerical stability. This can be an issue if you are planning on inverting the matrix.