I create a toy data set that can be reproduced:
# Set a seed for reproducibility
set.seed(123)
# Generate Data
n <- 100 # Number of observations
X <- rnorm(n) # Simulated predictor variable
# Define parameters for two regression models
beta1 <- c(1, 2) # Coefficients for regression model 1
beta2 <- c(4, 6) # Coefficients for regression model 2
# Simulate Mixture Components
# Simulate response variables for each component
Y1 <- beta1[1] + beta1[2] * X + rnorm(n, mean = 0, sd = 1) # Component 1
Y2 <- beta2[1] + beta2[2] * X + rnorm(n, mean = 0, sd = 1) # Component 2
# Generate Mixing Weights
# Define mixing weights (probability of belonging to each component)
w1 <- 0.6 # Weight for Component 1
w2 <- 0.4 # Weight for Component 2
# Combine Components
# Generate final response variable based on mixing weights
Y <- ifelse(runif(n) < w1, Y1, Y2) # Simulated mixture of two regressions
I then fit a model like so:
# Fit a mixture of two regressions
library(flexmix)
fit <- flexmix( Y~X, k = 2,control = list( iter = 100))
To get posterior probabilities for cluster assignments, I can use the function posterior()
as follows:
# Get posterior probs
p <-posterior(fit)
How could I recreate such posterior probabilities for a single chosen point, for example:
p[81,]
which returns:
0.7356298 0.2643702
Here is my attempt:
# Predict for the two components
predicted <- predict(fit)
# Get coeffs
coeffs1<-parameters(fit, component = 1)
coeffs2<-parameters(fit, component = 2)
# Get prior probs
probs<-summary(fit)@comptab$prior
# Recreate component estimates and posterior probs for a selected observation
index <- 81
my_Y <- Y[index]
my_X <- X[index]
# Get estimates for each component separately
f1 <- coeffs1[1] + coeffs1[2]*my_X;f1
f2 <- coeffs2[1] + coeffs2[2]*my_X;f2
#check if match
predicted[[1]][index]==f1
predicted[[2]][index]==f2
# Get estimated parameters
params <- cbind(coeffs1,coeffs2)
# Number of components
k <- ncol(params)
# Initialize an empty matrix for posterior probabilities
posterior_probs <- matrix(NA, nrow = n, ncol = k)
# Calculate the density for each component and then the posterior probabilities
for (i in 1:k) {
# Calculate the density for component i
fj_x <- dnorm(Y, mean = params[1, i] + params[2, i] * X, sd = sqrt(params[3, i]))
# Calculate the posterior probability for component i
posterior_probs[, i] <- probs[i] * fj_x
}
# Normalize the posterior probabilities
posterior_probs <- posterior_probs / rowSums(posterior_probs)
# Now, posterior_probs[i, j] contains the posterior probability that observation i belongs to component j
posterior_probs[index,]
which returns 0.7010621 0.2989379
Where am I going wrong?