Uanble to recreate built-in posterior() in R Toy example provided

46 views Asked by At

I create a toy data set that can be reproduced:

# Set a seed for reproducibility
set.seed(123)

# Generate Data
n <- 100  # Number of observations
X <- rnorm(n)  # Simulated predictor variable

# Define parameters for two regression models
beta1 <- c(1, 2)  # Coefficients for regression model 1
beta2 <- c(4, 6)  # Coefficients for regression model 2

# Simulate Mixture Components
# Simulate response variables for each component
Y1 <- beta1[1] + beta1[2] * X + rnorm(n, mean = 0, sd = 1)  # Component 1
Y2 <- beta2[1] + beta2[2] * X + rnorm(n, mean = 0, sd = 1)  # Component 2

# Generate Mixing Weights
# Define mixing weights (probability of belonging to each component)
w1 <- 0.6  # Weight for Component 1
w2 <- 0.4  # Weight for Component 2

# Combine Components
# Generate final response variable based on mixing weights
Y <- ifelse(runif(n) < w1, Y1, Y2)  # Simulated mixture of two regressions

I then fit a model like so:

# Fit a mixture of two regressions
library(flexmix)
fit <- flexmix( Y~X, k = 2,control = list( iter = 100))

To get posterior probabilities for cluster assignments, I can use the function posterior() as follows:

# Get posterior probs
p  <-posterior(fit)

How could I recreate such posterior probabilities for a single chosen point, for example:

p[81,]

which returns:

0.7356298 0.2643702

Here is my attempt:

# Predict for the two components                
predicted     <- predict(fit)

# Get coeffs
coeffs1<-parameters(fit, component = 1)
coeffs2<-parameters(fit, component = 2)

# Get prior probs
probs<-summary(fit)@comptab$prior

# Recreate component estimates and posterior probs for a selected observation
index <- 81
my_Y  <- Y[index]
my_X  <- X[index]

# Get estimates for each component separately
f1 <- coeffs1[1] + coeffs1[2]*my_X;f1
f2 <- coeffs2[1] + coeffs2[2]*my_X;f2

#check if match
predicted[[1]][index]==f1
predicted[[2]][index]==f2

# Get estimated parameters
params <- cbind(coeffs1,coeffs2)

# Number of components
k <- ncol(params)

# Initialize an empty matrix for posterior probabilities
posterior_probs <- matrix(NA, nrow = n, ncol = k)

# Calculate the density for each component and then the posterior probabilities
for (i in 1:k) {
  # Calculate the density for component i
  fj_x <- dnorm(Y, mean = params[1, i] + params[2, i] * X, sd = sqrt(params[3, i]))
  
  # Calculate the posterior probability for component i
  posterior_probs[, i] <- probs[i] * fj_x
}

# Normalize the posterior probabilities
posterior_probs <- posterior_probs / rowSums(posterior_probs)

# Now, posterior_probs[i, j] contains the posterior probability that observation i belongs to component j
posterior_probs[index,]

which returns 0.7010621 0.2989379

Where am I going wrong?

0

There are 0 answers