I want to get column names of a matrix to set another one, but if matrix does not have column names (or is set to NULL), the following code crashes my R session.

CharacterVector cn = colnames(x);

The following code is the way how I get column names of a matrix even if it does not have.

#include <Rcpp.h>
using namespace Rcpp;

// Get column names or empty
// [[Rcpp::export]]
CharacterVector get_colnames(const NumericMatrix &x) {
   CharacterVector cn;

   SEXP cnm = colnames(x);
   if (!Rf_isNull(cnm)) cn = cnm;

   return(cn);
}

Is there a more elegant way?

2 Answers

2
Dirk Eddelbuettel On Best Solutions

I had started this and then got distracted. @coatless covered it, this is simply shorter.

Code

#include <Rcpp.h>

// [[Rcpp::plugins(cpp11)]]
using namespace Rcpp;

// [[Rcpp::export]]
CharacterVector getColnames(const NumericMatrix &x) {
  size_t nc = x.cols();
  SEXP s = x.attr("dimnames");  // could be nil or list
  if (Rf_isNull(s)) {           // no dimnames, need to construct names
    CharacterVector res(nc);
    for (size_t i=0; i<nc; i++) {
      res[i] = std::string("V") + std::to_string(i);
    }
    return(res);
  } else {                      // have names, return colnames part
    List dn(s);
    return(dn[1]);
  }

}

/*** R
m <- matrix(1:9,3,3)
getColnames(m)
colnames(m) <- c("tic", "tac", "toe")
getColnames(m)
*/

Output

R> Rcpp::sourceCpp("~/git/stackoverflow/55850510/answer.cpp")

R> m <- matrix(1:9,3,3)

R> getColnames(m)
[1] "V0" "V1" "V2"

R> colnames(m) <- c("tic", "tac", "toe")

R> getColnames(m)
[1] "tic" "tac" "toe"
R>
3
coatless On

Few notes:

  1. Matrices do not always have colnames() or rownames() set.
    • If one is set, then the object has the attribute of dimnames.
  2. It's okay to check for existence of a value via the C API for R.
    • e.g. Rf_isNull().
  3. An alternative existence check would be to verify if dimnames is part of the attributes for the object.
    • From there, check if the entry in dimnames is null.

Let's verify these the first point by first creating a matrix without names and then making one with names. Finally, we'll introduce a more verbose version of your function that tries to resolve a matrix without column names.

Matrix construction

So, the traditional matrix construction would be:

x_no_names = matrix(1:4, nrow = 2)

x_no_names
#>      [,1] [,2]
#> [1,]    1    3
#> [2,]    2    4
colnames(x_no_names)
#> NULL
rownames(x_no_names)
#> NULL
attributes(x_no_names)
#> $dim
#> [1] 2 2

So, there is no dimnames for a matrix created without column or row names.

What happens if we assign column or rownames to the attributes?

# Create a matrix with names
x_named = x_no_names
colnames(x_named) = c("Col 1", "Col 2")
rownames(x_named) = c("Row 1", "Row 2")

# View attributes
attributes(x_named)
#> $dim
#> [1] 2 2
#> 
#> $dimnames
#> $dimnames[[1]]
#> [1] "Row 1" "Row 2"
#> 
#> $dimnames[[2]]
#> [1] "Col 1" "Col 2"

# View matrix object
x_named
#>       Col 1 Col 2
#> Row 1     1     3
#> Row 2     2     4

Notice: The matrix object now has a dimnames attribute.

Implementing a Check in C++

With our understanding of the matrix structure, we can check:

  1. Does dimnames exist as an attribute on the matrix?
  2. Is the second entry in dimnames not NULL?

Note: This approach will make the original function a bit more verbose. The trade off is the function will avoid having to use a SEXP return type.

#include <Rcpp.h>

// Get column names or empty
// [[Rcpp::export]]
Rcpp::CharacterVector get_colnames(const Rcpp::NumericMatrix &x) {

  // Construct a character vector
  Rcpp::CharacterVector cn;

  // Create a numerical index for each column
  Rcpp::IntegerVector a = Rcpp::seq_len(x.ncol());
  // Coerce it to a character
  Rcpp::CharacterVector b = Rcpp::as<Rcpp::CharacterVector>(a);

  // Assign to character vector
  cn  = b;

  if(x.hasAttribute("dimnames")) {
    Rcpp::List dimnames = x.attr( "dimnames" ) ;

    if(dimnames.size() != 2) {
      Rcpp::stop("`dimnames` attribute must have a size of 2 instead of %s.", dimnames.size());
    }

    // Verify column names exist by checking for NULL
    if(!Rf_isNull(dimnames[1]) ) {
      // Retrieve colnames and assign to cn.
      cn = dimnames[1];
    } else {
     // Assign to the matrix
     colnames(x) = cn;
    }
  } 

  return(cn);
}

Testing the C++ variant

Calling the function would now give:

get_colnames(x_no_names)
#> [1] "1" "2"

get_colnames(x_named)
#> [1] "Col 1" "Col 2"

The first indicates we are using the generated indices whereas the second indicates were retrieving values.