how to calculate S.D. per group in a dataframe in R and plotting it groupwise

792 views Asked by At

I have my dataframe as:

T_ID   S1    S2
1      21    26
1      20    25
1      21    22
2      20    53
2      41    62
2      30    23

I have plotted the S1 and S2 in a same graph as per T_ID(T_ID "1" have one color, T_ID "2" have one color so on..). Now I want to plot the Standard deviation as per T_IDs in the same graph. I can't figure out how to do this..

2

There are 2 answers

3
OFish On BEST ANSWER

The answer I am providing uses three packages: tidyr, dplyr and ggplot2. It's a bit hacky, but I think it gives you the desired output but requires you to get the data into the right format and calculating your desired values to to define aesthetics prior to plotting it - a typical thing for ggplot2. If someone else has an easier way of doing this, I'd love to see it, right now this is the best I could come up with given the data provided.

First, get your data into the right format (assuming your data is called "df"), calculate the mean and standard deviation per time point (T1, T2) and group (S1, S2), then plot the barplot with errorbars representing mean +/- SD.

require(tidyr)
require(dplyr)
require(ggplot2)

df2 <- df %>% gather(group, measurement, S1:S2)
df3 <- df2 %>% group_by(T_ID, group) %>% mutate(sd = sd(measurement), m = mean(measurement))
gg1 <- ggplot(df3, aes(x=group, y=measurement, fill=factor(T_ID)))
gg1 + geom_bar(width=0.4, position=position_dodge(width=0.5), stat="identity")+geom_errorbar(aes(ymin=m-sd, ymax=m+sd), position=position_dodge(width=0.5), width=0.4, size=0.1)

Gives you the following

enter image description here

EDIT FOLLOWING OPS SPECIFICATIONS

First attempt that didn't work.

df4 <- df %>% group_by(T_ID) %>% mutate(SD1 = sd(S1)) %>% mutate(SD2 = sd(S2)) %>% mutate(mean_s1 = mean(S1)) %>% mutate(mean_s2 = mean(S2))

df4
Source: local data frame [6 x 7]
Groups: T_ID

  T_ID S1 S2        SD1       SD2  mean_s1  mean_s2
1    1 21 26  0.5773503  2.081666 20.66667 24.33333
2    1 20 25  0.5773503  2.081666 20.66667 24.33333
3    1 21 22  0.5773503  2.081666 20.66667 24.33333
4    2 20 53 10.5039675 20.420578 30.33333 46.00000
5    2 41 62 10.5039675 20.420578 30.33333 46.00000
6    2 30 23 10.5039675 20.420578 30.33333 46.00000

gg2 <- ggplot(df4, aes(x=S1, y=S2, fill=factor(T_ID)))
gg2 + geom_point(aes(col=factor(T_ID)))+geom_errorbar(aes(ymin=mean_s1-SD1, ymax=mean_s1+SD2))+geom_errorbarh(aes(xmin=mean_s2-SD2, xmax=mean_s2+S2))

### this doesn't really work...too many error bars mapping all over the place

#create a new data-frame with plotting coordinates for geom_errobar; I tried this because in the help menu it said you could provide a new df to geom_errorbar() to overide plotting aesthetics, but

df2 <- df %>% group_by(T_ID) %>% summarise(mean_s1=mean(S1), sd_s1=sd(S1), mean_s2=mean(S2), sd_s2=sd(S2))
gg2 <- ggplot(df, aes(x=S1, y=S2, group=factor(T_ID), colour=factor(T_ID)))
gg2 + geom_point()+geom_errorbar(aes(ymax=mean_s1+sd_s1, ymin=mean_s1-sd_s1), data=df2)
Error in eval(expr, envir, enclos) : object 'S1' not found

# doesn't work

Second attempt.

EDIT - Possible solution to OPs question

df4 <- df %>% group_by(T_ID) %>% mutate(SD1 = sd(S1)) %>% mutate(SD2 = sd(S2)) %>% mutate(mean_s1 = mean(S1)) %>% mutate(mean_s2 = mean(S2))

    df4
    Source: local data frame [6 x 7]
    Groups: T_ID

      T_ID S1 S2        SD1       SD2  mean_s1  mean_s2
    1    1 21 26  0.5773503  2.081666 20.66667 24.33333
    2    1 20 25  0.5773503  2.081666 20.66667 24.33333
    3    1 21 22  0.5773503  2.081666 20.66667 24.33333
    4    2 20 53 10.5039675 20.420578 30.33333 46.00000
    5    2 41 62 10.5039675 20.420578 30.33333 46.00000
    6    2 30 23 10.5039675 20.420578 30.33333 46.00000
gg2 <- ggplot(df4, aes(x=S1, y=S2, fill=factor(T_ID)))
gg2 + geom_point(aes(col=factor(T_ID)))+
+     geom_errorbar(aes(x=mean_s2, y=mean_s1, ymin=mean_s1-SD1,ymax=mean_s1+SD2, colour=factor(T_ID)))+geom_errorbarh(aes(x=mean_s2, y=mean_s1, xmin=mean_s1-SD1, xmax=mean_s1+SD2, colour=factor(T_ID)))

Provides you with this plot below, where the errorbars are plotted according to longitude and lattitude. I gather with your real data the errorbars will be more aesthetically pleasing to the eye. scatter with xy errobars

1
Karolis Koncevičius On

Below is one simple solution assuming your data frame is named df1

means <- apply(df1[,2:3], 2, tapply, df1[,1], mean)
sds <- apply(df1[,2:3], 2, tapply, df1[,1], sd)
m <- barplot(means, beside=TRUE, ylim=c(0, 60), legend=TRUE)
segments(m, means - sds, m, means + sds, lwd=2)

This gives us:

barplot