I have a List of 6 in a data.frame
It has 3 columns:
id, T_C, Sales
T_C is TEST or CONTROL
Someone helped me here and I learned how to find the mean() and sd() by looping, instead of doing individual statements.
Now my goal is to remove the outliers from the 6 lists and produce a List of 6 (after removing outliers).
str(dfList) # this is the list of 6 in data.frames
I am able to get the mean() and sd() of each list like this:
list_mean_sd <- lapply(dfList,
function(df)
{
df %>%
group_by(TC_INDICATOR) %>%
summarise(mean = mean(NET_SPEND),
sd = sd(NET_SPEND))
})
> str(list_mean_sd)
List of 6 (1 obs. of 2 variables:)
I can selected them individually for mean or sd:
sapply(list_mean_sd, "[", "mean")
sapply(list_mean_sd, "[", "sd")
Basically, my goal is to id the outliers and remove them, product an alternative set, or after-set.
**outliers are: mean - 3*sd() or mean + 3*sd()
I have this done, but with more manually steps, looking to learn how to loop through these sets and stuff like that, thanks in advance for helping me!
Give this a shot. First I create data which I split into six data frames which are housed in a list.
Then, I use
lapply
on this list to identify what I'm calling thez_scores
which are computed as the difference between themean
ofSales
and each individualSales
divided by thesd
ofSales
. Finally, we use filter on these to pull out the ones which have az_score
with an absolute value over 3.Obviously, this will give you only the outliers. If you want to keep only the inliers, you change the
>= 3
to a< 3
.Updated to get Wilcox test on inliers
We just run
lapply
on the list of inliers using the parameters noted in OP's comment.