I would like to select specific elements of a data.list
after processing it.
To get process parameters I describe the my problem in the reproducible example.
In the example code below, I have three sets of data.list
each have 5 column.
Each data.list
repeat theirselves three times each and each data.list
assignet to unique number called set_nbr
which defines these datasets.
#to create reproducible data (this part creates three sets of data each one repeats 3 times of those of Mx, My and Mz values along with set_nbr)
set.seed(1)
data.list <- lapply(1:3, function(x) {
nrep <- 3
time <- rep(seq(90,54000,length.out=600),times=nrep)
Mx <- c(replicate(nrep,sort(runif(600,-0.014,0.012),decreasing=TRUE)))
My <- c(replicate(nrep,sort(runif(600,-0.02,0.02),decreasing=TRUE)))
Mz <- c(replicate(nrep,sort(runif(600,-1,1),decreasing=TRUE)))
df <- data.frame(time,Mx,My,Mz,set_nbr=x)
})
after applying some function I have output like this.
result
time Mz set_nbr
1 27810 -1.917835e-03 1
2 28980 -1.344288e-03 1
3 28350 -3.426615e-05 1
4 27900 -9.934413e-04 1
5 25560 -1.016492e-02 2
6 27360 -4.790767e-03 2
7 28080 -7.062256e-04 2
8 26550 -1.171716e-04 2
9 26820 -2.495893e-03 3
10 26550 -7.397865e-03 3
11 26550 -2.574022e-03 3
12 27990 -1.575412e-02 3
My questions starts from here.
1) How to get min
,middle
and max
values of time
column, for each set_nbr
?
2) How to use evaluated set_nbr
and Mz
values inside of data.list
?
In short;
After deciding the min
,middle
and max
values from time
column and corresponding Mz
values for each set_nbr
in result
, I want to return back to original data.list and extract those columns of Mx
, My
, Mz
according those of set_nbr
and Mz
values. Since each set_nbr
actually corresponding to 600 rows, I would like to extract those defined set_nbr
s family from data.list
we use time
as a factor
to select set_nbr
. Here factor
means as extraction parameter not the real factor
in R command.
In addition, as you will see four set_nbr exist for each dataset but they are indeed addressing different dataset in the data.list
I'm a big advocate of using lists of data frames when appropriate, but in this case it doesn't look like there's any reason to keep them separated as different list items. Let's combine them into a single data frame.
Then getting your summary stats is easy:
In your sample data,
time
is defined the same way each time, so of course the min, median, and max are all the same.I'd suggest, in the new question you ask about plotting, starting with the combined data frame
dat
.As to your second question:
Selecting a single item from a list, use double brackets
However, with the combined data, it's just a normal column of a normal data frame so any of these will work:
To your clarification in comments, if you want the Mx and My values for the
time
andset_nbr
in theresults
object, using my combineddat
above, simply do a join:left_join(results, dat)
.This should work, but I'm a little confused because in your simulated data
time
is numeric, but in your new text you say "we usetime
as afactor
". If you've converted time to a factor object, this will only work if it has the samelevels
in each of the data frames in your data list. If not, I would recommend keepingtime
asnumeric
.