How to select specific elements and find their index in a data.frame?

1k views Asked by At

I would like to select specific elements of a data.list after processing it.

To get process parameters I describe the my problem in the reproducible example. In the example code below, I have three sets of data.list each have 5 column.

Each data.list repeat theirselves three times each and each data.list assignet to unique number called set_nbr which defines these datasets.

#to create reproducible data (this part creates three sets of data each one repeats 3 times of those of Mx, My and Mz values along with set_nbr)
set.seed(1) 
data.list <- lapply(1:3, function(x) {
nrep <- 3
time <- rep(seq(90,54000,length.out=600),times=nrep) 
Mx <- c(replicate(nrep,sort(runif(600,-0.014,0.012),decreasing=TRUE)))
My <- c(replicate(nrep,sort(runif(600,-0.02,0.02),decreasing=TRUE)))
Mz <- c(replicate(nrep,sort(runif(600,-1,1),decreasing=TRUE)))
df <- data.frame(time,Mx,My,Mz,set_nbr=x)
})

after applying some function I have output like this.

 result

       time     Mz           set_nbr
 1  27810 -1.917835e-03       1
 2  28980 -1.344288e-03       1
 3  28350 -3.426615e-05       1
 4  27900 -9.934413e-04       1
 5  25560 -1.016492e-02       2
 6  27360 -4.790767e-03       2
 7  28080 -7.062256e-04       2
 8  26550 -1.171716e-04       2
 9  26820 -2.495893e-03       3
 10 26550 -7.397865e-03       3
 11 26550 -2.574022e-03       3
 12 27990 -1.575412e-02       3  

My questions starts from here.

1) How to get min,middle and max values of time column, for each set_nbr ?

2) How to use evaluated set_nbr and Mz values inside of data.list?

In short;

After deciding the min,middle and max values from time column and corresponding Mz values for each set_nbr in result, I want to return back to original data.list and extract those columns of Mx, My, Mz according those of set_nbr and Mz values. Since each set_nbr actually corresponding to 600 rows, I would like to extract those defined set_nbrs family from data.list

we use time as a factor to select set_nbr. Here factor means as extraction parameter not the real factor in R command.

In addition, as you will see four set_nbr exist for each dataset but they are indeed addressing different dataset in the data.list

1

There are 1 answers

10
Gregor Thomas On BEST ANSWER

I'm a big advocate of using lists of data frames when appropriate, but in this case it doesn't look like there's any reason to keep them separated as different list items. Let's combine them into a single data frame.

library(dplyr)
dat = bind_rows(data.list)

Then getting your summary stats is easy:

dat %>% group_by(set_nbr) %>%
    summarize(min_time = min(time),
              max_time = max(time),
              middle_time = median(time))

# Source: local data frame [3 x 4]
#
#   set_nbr min_time max_time middle_time
# 1       1       90    54000       27045
# 2       2       90    54000       27045
# 3       3       90    54000       27045

In your sample data, time is defined the same way each time, so of course the min, median, and max are all the same.

I'd suggest, in the new question you ask about plotting, starting with the combined data frame dat.

As to your second question:

2) How to select evaluated set_nbr values inside of data.list?

Selecting a single item from a list, use double brackets

data.list[[2]]

However, with the combined data, it's just a normal column of a normal data frame so any of these will work:

dat[dat$set_nbr == 2, ]
subset(dat, set_nbr == 2)
filter(dat, set_nbr == 2)

To your clarification in comments, if you want the Mx and My values for the time and set_nbr in the results object, using my combined dat above, simply do a join: left_join(results, dat).

This should work, but I'm a little confused because in your simulated data time is numeric, but in your new text you say "we use time as a factor". If you've converted time to a factor object, this will only work if it has the same levels in each of the data frames in your data list. If not, I would recommend keeping time as numeric.