Create new columns for percentile rank of numerous columns in a data frame in R

835 views Asked by At

I have a fairly large data-set (4000 obs of 149 variables), and I would like to look at the percentile rank of many of these variables. I have been able to successfully generate the percentile ranks (I believe) ignoring NA values with the following code:

    prank <- function(x){
       r <- rank(x)/sum(!is.na(x))*100
       r[is.na(x)]<-NA
       r
    }

My question is how to automate applying this function to the columns I am interested in, returning a new column with the ranks? I tried this:

    y <- data.frame(x, t(apply(-x,1,prank)))

But this appears to group everything together and establish the ranks. I essentially want to be able to do the following on ~100 different columns:

    y$V5.pr <- prank(x$V5)
1

There are 1 answers

0
Dion Stat On

if you want the percentile ranks on the interval 0-100, consider subtracting 1 on both nominator and denominator of r:

prank <- function(x){
  r <- (rank(x)-1)/(sum(!is.na(x))-1)*100
  r[is.na(x)]<-NA
  return(r)
} 

Another possibility with x as the dataframe with numeric variables to mutate to percentile ranks:

y <- apply(x, 2, prank)

Or the option with added named columns:

x[ , paste0(names(x),".pr")] <- apply(x, 2, prank)