Sort elements of a NumericMatrix by dim names

438 views Asked by At

I have a NumericMatrix m. Say m is (the elements in the square brackets are the dim names)

7 9 8
4 6 5
1 3 2 with column names = {"x", "z", "y"}, row names = {"z", "y", "x"}

I want the following output
1 2 3
4 5 6
7 8 9 with column names = {"x", "y", "z"}, row names = {"x", "y", "z"}

So what I want to do is the following -

  1. Sort elements of each row according to the column names
  2. Permute the rows such that their corresponding row names are sorted

Is there an easy way to do this in Rcpp for a general NumericMatrix?

1

There are 1 answers

0
nrussell On

This isn't necessarily the simplest approach, but it appears to work:

#include <Rcpp.h>
#include <map>
// [[Rcpp::plugins(cpp11)]]

// [[Rcpp::export]]
Rcpp::NumericMatrix dim_sort(const Rcpp::NumericMatrix& m) {

  Rcpp::Function rownames("rownames");
  Rcpp::Function colnames("colnames");
  Rcpp::CharacterVector rn = rownames(m);
  Rcpp::CharacterVector cn = colnames(m);

  Rcpp::NumericMatrix result(Rcpp::clone(m));
  Rcpp::CharacterVector srn(Rcpp::clone(rn));
  Rcpp::CharacterVector scn(Rcpp::clone(cn));

  std::map<std::string, int> row_map;
  std::map<std::string, int> col_map;

  for (int i = 0; i < rn.size(); i++) {
    row_map.insert(std::pair<std::string, int>(Rcpp::as<std::string>(rn[i]), i));
    col_map.insert(std::pair<std::string, int>(Rcpp::as<std::string>(cn[i]), i));
  }

  typedef std::map<std::string, int>::const_iterator cit;
  cit cm_it = col_map.begin();
  int J = 0;
  for (; cm_it != col_map.end(); ++cm_it) {
    int I = 0;
    int j = cm_it->second;
    scn[J] = cm_it->first;
    cit rm_it = row_map.begin();
    for (; rm_it != row_map.end(); ++rm_it) {
      int i = rm_it->second;
      result(J, I) = m(j, i);
      srn[I] = rm_it->first;
      I++;
    }
    J++;
  }

  result.attr("dimnames") = Rcpp::List::create(srn, scn);
  return result;
}

/*** R

x <- matrix(
  c(7,9,8,4,6,5,1,3,2),
  nrow = 3,
  dimnames = list(
    c("x", "z", "y"),
    c("z", "y", "x")
  ),
  byrow = TRUE
)

R> x
  z y x
x 7 9 8
z 4 6 5
y 1 3 2

R> dim_sort(x)
  x y z
x 1 2 3
y 4 5 6
z 7 8 9

*/

I used a std::map<std::string, int> for two reasons:

  1. maps automatically maintain a sorted order based on their keys, so by using the dim names as keys, the container does the sorting for us.
  2. Letting a key's corresponding value be an integer representing the order in which it was added, we have an index for retrieving the appropriate value along a given dimension.