I have an object of class big.matrix in R with dimension 778844 x 2. The values are all integers (kilometres). My objective is to calculate the Euclidean distance matrix using the big.matrix and have as a result an object of class big.matrix. I would like to know if there is an optimal way of doing that.
The reason for my choice of using the class big.matrix is memory limitation. I could transform my big.matrix to an object of class matrix and calculate the Euclidean distance matrix using dist(). However, dist() would return an object of size that would not be allocated in the memory.
Edit
The following answer was given by John W. Emerson, author and maintainer of the bigmemory package:
You could use big algebra I expect, but this would also be a very nice use case for Rcpp via sourceCpp(), and very short and easy. But in short, we don't even attempt to provide high-level features (other than the basics which we implemented as proof-of-concept). No single algorithm could cover all use cases once you start talking out-of-memory big.
Here is a way using
RcppArmadillo. Much of this is very similar to the RcppGallery example. This will return abig.matrixwith the associated pairwise (by row) euclidean distances. I like to wrap mybig.matrixfunctions in a wrapper function to create a cleaner syntax (i.e. avoid the@addressand other initializations.Note - as we are using bigmemory (and therefore concerned with RAM usage) I have this example returned the N-1 x N-1 matrix of only lower triangular elements. You could modify this but this is what I threw together.
euc_dist.cpp
My little wrapper
The test