I am creating histograms of substitutions: 1st, 2nd,or 3rd sub over Time. So each histogram shows the number of subs in a given minute given the Sub Number. The histograms make sense to me because for the most part they are smooth (I used a bin width of 1 minute). Nothing looks too out of the ordinary. However, when I overlay a density plot, the tails on the left inflate and I cannot determine why for one of the graphs.
The dataset is of substitions, ranging from minute 1 to a maximum time. I then cut this dataset in half to only look at when the sub was made after minute 45. I have not folded this data back and I have tried to create a reproducable example, but cannot given the data.
Code used to create in R
## Filter out subs that are not in the second half df.half<-df[df$PeriodId>=2,] p<-ggplot(data=df.half, aes(x=time)) + geom_histogram(aes(y=..density..),position="identity", alpha=0.5,binwidth=1)+ geom_vline(data=sumy.df.half,aes(xintercept=grp.mean),color="blue", linetype="dashed", size=1)+ geom_density(alpha=.2)+ facet_grid(SUB_NUMBER ~ .)+ scale_y_continuous(limits = c(0,0.075),breaks = c(seq(0,0.075,0.025)), minor_breaks = c(seq(0,0.075,0.025)),name='Count') p
Why, for the First Sub is the density plot inflated in the tail if there are no values less than 45? Also why isn't the density plot more inflated in the tail for the Second Sub?
Side Note: I did ask this question on crossvalidated, but was told since it involved R, to ask it here instead. Here
So I was able to change the code and get the following:
ggplot() + geom_histogram(data=df.half, aes(x=time,y=..density..),position="identity", alpha=0.5,binwidth=1)+ geom_density(data=df.half,aes(x=time,y=..density..))+ geom_vline(data=sumy.df.half,aes(xintercept=grp.mean),color="blue", linetype="dashed", size=1)+ facet_grid(SUB_NUMBER ~ .)
This looks more correct and at least now fits the dataset. However, I am still confused as to why those issues occured in the first place.