Normal Mixture Distribution

210 views Asked by At

I am trying to create a qqplot and run a KS test for a normal mixture distribution with 25% N(μ=0,σ=4) and 75% N(μ=4,σ=2). How could I adapt my qqplot and KS test for this distribution? I don't think my abline is correct and my KS test doesn't really reflect the distribution correctly.

Any help would be appreciated.

set.seed(4711)
n = 500
P = ppoints(n)
Q = qnorm(P)

dt <- sample(c(1,2), prob= c(0.25,0.75), size = n, replace = T)
x <- c()
for(i in 1:n){
  if(dt[i] == 1) x[i]=rnorm(1, mean = 0, sd = 4) else x[i] = rnorm(1, mean = 4, sd = 2)
}

hist(x, prob = T, breaks = 27, col = "lightgreen", main = "Mixture Normal")
curve(0.25*dnorm(x, mean = 0, sd = 4) + 0.75*dnorm(x, mean = 4, sd = 2), add = T, col = 2, lwd = 3, lty = 2)

qqplot(Q, x)
abline(0,1)


ks.test(x, 'pnorm')
1

There are 1 answers

0
IRTFM On BEST ANSWER

The way to get a more sensible qqplot, i.e. one where the "straight line representing the "theoretical" (or empirical in the case of a two sample version as in this case) is to scale the arguments properly. A "qqplot" for a one-sample KS test is really "semi-parametric", i.e the mean and standard deviation of the sample under test is first extracted and then used for the scaling of the plot of the order statistics. So do this:

 qqplot(Q, scale(x) )  # make the mean 0 and the SD=1
 abline(0,1)

enter image description here

ks.test(x, 'pnorm')
#------------------
    One-sample Kolmogorov-Smirnov test

data:  x
D = 0.70763, p-value < 2.2e-16
alternative hypothesis: two-sided