I am trying detect the number of modes in a variable in R.
My variable was collected in a survey on a 1-7 scale which is why I want to fit mixtures of truncated Gaussian distributions. I am struggling to find a way to do so in R.
Here is an code to produce an example of a simulated mixture of three truncated normal distributions:
#install.packages("truncnorm")
library(truncnorm)
set.seed(123)
# Simulate the first truncated normal distribution
mu1 <- 0.5; sd1 <- 0.5
a1 <- 0; b1 <- 7 # upper and lower bounds
N1 <- 100
sample1 <- rtruncnorm(N1, a=a1, b=b1, mean=mu1, sd=sd1)
# Simulate the second truncated normal distribution
mu2 <- 5; sd2 <- 1.5
a2 <- 0; b2 <- 7
N2 <- 110
sample2 <- rtruncnorm(N2, a=a2, b=b2, mean=mu2, sd=sd2)
# Simulate third truncated normal distribution
mu3 <- 6.8; sd3 <- 0.5
a3 <- 0; b3 <- 7
N3 <- 60
sample3 <- rtruncnorm(N3, a=a3, b=b3, mean=mu3, sd=sd3)
# Merge the two samples
combined_sample <- c(sample1, sample2, sample3)
data <- data.frame(combined_sample,
class = c(rep("sample1", N1), rep("sample2", N2), rep("sample3", N3))) # true class
# Plot the joined distribution for illustration
hist(data$combined_sample, breaks=30, main="Merged Truncated Normal Distributions",
xlab="Value", ylab="Frequency", col="lightblue", border="black")
abline(v=c(mu1, mu2, mu3), col=c("red", "blue", "green"), lwd=2) # Add the true means for reference
I am unsure how to now recover these three truncated distributions from the data using R (i.e., how to find out how many truncated normals this distribution consists of and what their means and variances are).
For Gaussian mixture models without truncation I usually use the mclust
package but for some cases of mixtures of truncated data (such as the example simulation above) it does not reliably recover the number of modes:
# install.packages("mclust")
library(mclust)
# model with lowest BIC has 4 components:
mclustBIC(data$combined_sample)
(Here it overestimates the number of underlying distributions).
Is there a way to account for the truncation of the underlying normal distributions in R?