Fitting mixtures of truncated normal distributions in R

93 views Asked by At

I am trying detect the number of modes in a variable in R.

My variable was collected in a survey on a 1-7 scale which is why I want to fit mixtures of truncated Gaussian distributions. I am struggling to find a way to do so in R.

Here is an code to produce an example of a simulated mixture of three truncated normal distributions:

#install.packages("truncnorm")
library(truncnorm)

set.seed(123)

# Simulate the first truncated normal distribution
mu1 <- 0.5; sd1 <- 0.5
a1 <- 0; b1 <- 7 # upper and lower bounds
N1 <- 100
sample1 <- rtruncnorm(N1, a=a1, b=b1, mean=mu1, sd=sd1)

# Simulate the second truncated normal distribution
mu2 <- 5; sd2 <- 1.5
a2 <- 0; b2 <- 7
N2 <- 110
sample2 <- rtruncnorm(N2, a=a2, b=b2, mean=mu2, sd=sd2)

# Simulate third truncated normal distribution
mu3 <- 6.8; sd3 <- 0.5
a3 <- 0; b3 <- 7
N3 <- 60
sample3 <- rtruncnorm(N3, a=a3, b=b3, mean=mu3, sd=sd3)

# Merge the two samples
combined_sample <- c(sample1, sample2, sample3)
data <- data.frame(combined_sample, 
                   class = c(rep("sample1", N1), rep("sample2", N2), rep("sample3", N3))) # true class

# Plot the joined distribution for illustration
hist(data$combined_sample, breaks=30, main="Merged Truncated Normal Distributions", 
     xlab="Value", ylab="Frequency", col="lightblue", border="black")
abline(v=c(mu1, mu2, mu3), col=c("red", "blue", "green"), lwd=2) # Add the true means for reference

I am unsure how to now recover these three truncated distributions from the data using R (i.e., how to find out how many truncated normals this distribution consists of and what their means and variances are).

For Gaussian mixture models without truncation I usually use the mclust package but for some cases of mixtures of truncated data (such as the example simulation above) it does not reliably recover the number of modes:

# install.packages("mclust")
library(mclust)

# model with lowest BIC has 4 components: 
mclustBIC(data$combined_sample)

(Here it overestimates the number of underlying distributions).

Is there a way to account for the truncation of the underlying normal distributions in R?

0

There are 0 answers