Apply function iteratively across a dataframe

224 views Asked by At

I have a two-part question for applying a function across a dataset in R.

i) Firstly, I have 2 data frames that I would like to be combined and paired iteratively, so that something like a cbind function would line up the 1st columns of each data frame next to each other, then the 2nd columns and so on. In the example below, I would like an output combining df1 and df2 where the column order would be eg1, eg4, eg2, eg5, eg3, eg6.

eg1 <- as.data.frame(matrix(sample(0:1000, 36*10, replace=TRUE), ncol=1))
eg2 <- as.data.frame(matrix(sample(0:500, 36*10, replace=TRUE), ncol=1))
eg3 <- as.data.frame(matrix(sample(0:750, 36*10, replace=TRUE), ncol=1))
df1 <- cbind(eg1,eg2,eg3)
eg4 <- as.data.frame(matrix(sample(0:200, 36*10, replace=TRUE), ncol=1))
eg5 <- as.data.frame(matrix(sample(0:100, 36*10, replace=TRUE), ncol=1))
eg6 <- as.data.frame(matrix(sample(0:350, 36*10, replace=TRUE), ncol=1))
df2 <- cbind(eg4,eg5,eg6)

I know a manual way of doing this (below), but this would not be ideal when combining much larger datasets and I was wondering if there is a more efficient way of achieving this?

df3 <- cbind(df1,df2)
df3 <- df3[,c(1,4,2,5,3,6)]

(ii) Following this I would like to output seven values in each odd column based on the the 7 highest values in the corresponding even column. As an example, for the first two columns...

df4 <- df3[,1:2]
High_7 <- tail(df4[order(df4[,2]),],7)#Highest 7 values in even column
High_7 <- High_7[,1] #Select odd column values

But an example using this across the dataset, maybe through some form of apply function would be much more effective.

1

There are 1 answers

0
grrgrrbla On BEST ANSWER

for your first question of combining the cols of both dataframes iteratively (note that this only works if the names of both dataframes are unique, which they are NOT in your OP):

df3 <- Reduce(cbind,
       Map(function(x, y) cbind(df1[x], df2[y]), names(df1), names(df2))) 

for the second part I would use this:

results <- sapply(seq(1,ncol(df3),2),
                        function(i) df3[order(df3[,i+1], decreasing = TRUE), ][1:7,i])

if you want the results to be a data.frame just do:

results <- data.frame(results)