I have a fairly large data-set (4000 obs of 149 variables), and I would like to look at the percentile rank of many of these variables. I have been able to successfully generate the percentile ranks (I believe) ignoring NA values with the following code:
prank <- function(x){
r <- rank(x)/sum(!is.na(x))*100
r[is.na(x)]<-NA
r
}
My question is how to automate applying this function to the columns I am interested in, returning a new column with the ranks? I tried this:
y <- data.frame(x, t(apply(-x,1,prank)))
But this appears to group everything together and establish the ranks. I essentially want to be able to do the following on ~100 different columns:
y$V5.pr <- prank(x$V5)
if you want the percentile ranks on the interval 0-100, consider subtracting 1 on both nominator and denominator of r:
Another possibility with x as the dataframe with numeric variables to mutate to percentile ranks:
Or the option with added named columns: