How to check correctness of implemented expectation-maximization algorithm in r?

163 views Asked by At

Good afternoon !

I asked this question many times without any reply or comment (even in cross validated)!

Under R, I had implemented the expectation-maximization algorithm for gaussian-mixture model :

k=3   # number of known clusters 
w_k=rep(1,k)/k  # we initialize clusters weights by 1/k for each one 
n_j=rep(0,k)    # will be used later
print(w_k)      # printing
data=as.matrix(iris[1:150,-5]) # numerical datasets with 4-dimensions/axis , fifth-axis contains clusters labels.

means=sample(1:dim(data)[1],k,replace=FALSE) # We choose "k" vectors to be intial clusters means. means contains rows indices
mu=as.matrix(iris[means,-5])  # We retreive those means shuffled at random in a matrix of k rows and 4 columns.
sigma=cov(data)      # We compute covariance matrix for the whole dataset. 
print(sigma)
print(solve(sigma))
sigma_list=rep(list(sigma),k) # For each of the k clusters, we itinialize it's covariance matrix by sigma. 
 
# w_k , mu and sigma_list are the 3 parameters to update during the M-step of Esperance maximization ( EM-algorithm).

# Function to compute P(Xi/Cj).
# example : P_Xi_Cj(x=data[1,], mu = mu[3,] , sigma= sigma) // P(X1/C3)  liklihood of obser1 given cluster3.
#i.e : liklihood of observation given a cluster
P_Xi_Cj <- function(Xi , Cj , sigma= sigma) {
    x=Xi
    mu=Cj
    d=length(x)
    if (length(as.vector(sigma))==1 ){
        if(length(as.vector(x))==1){
            if(length(as.vector(mu))==1){
             return(as.numeric(1/sqrt(abs(1/sigma)*(2*pi)^d)*exp(-1/2*(x)%*%(1/sigma)%*%(x))))
             break  
            }
        }  }
    
    tr1=matrix(x-mu,ncol=1)
    tr2=matrix(x-mu,nrow=1)
    inverse=solve(sigma)
    total=tr2 %*% inverse %*% tr1
    return(as.numeric(1/sqrt(abs(det(sigma))*(2*pi)^d)*exp(-1/2*total)))
    
  }



# Computing P(Cj/Xi) : what is the more likely cluster given the observation ? 

P_Cj_Xi<-function(Xi , mu=mu ,sigma_list=sigma_list,W_k=w_k,n_clusters=k){
k=n_clusters
n_j=rep(0,n_clusters)
P_C_Xi=rep(0,n_clusters)    
r=matrix(NA,length(Xi),length(Xi))    
r=rep(list(r),n_clusters )    
r=lapply(1:n_clusters , function(i) r[[i]]=solve(matrix(unlist(sigma_list[i]),ncol=length(Xi))))     
n_j=sapply(1:n_clusters, function(i) -1/2*(rbind(Xi-mu[i,]))%*%as.matrix(r[[i]])%*%(cbind(Xi-mu[i,])))          
total=sum(((sapply(1:n_clusters,function(i) abs(det(as.matrix(r[[i]])))))^(-1/2))*exp(n_j)*W_k)
P_C_Xi=((sapply(1:n_clusters,function(i) abs(det(as.matrix(r[[i]])))))^(-1/2))*exp(n_j)*W_k/total
names(P_C_Xi)=paste("P(Cj/Xi)",1:n_clusters)
               
return(P_C_Xi)        
}
# example :  P_Cj_Xi(Xi=data[1,],mu=mu ,sigma_list=sigma_list,W_k=w_k,n_clusters=k)  i.e P(Cj/X1) j=1,..,n_clusters
                
M_step<-function(data=data,mu ,sigma_list,W_k,n_clusters=k){
l=lapply(1:nrow(data) , function(i) P_Cj_Xi(Xi=data[i,],mu,sigma_list,W_k,n_clusters=k) )  # E-step 
#print(l)       
sum_P_Cj_Xi=as.vector(Reduce("+", l)) 
#print("sum_P_Cj_Xi")
#print(sum_P_Cj_Xi)          
W_j=sum_P_Cj_Xi/nrow(data) #updating clusters weights (7)
mu=t(sapply(1:k, function(j) Reduce("+",lapply(1:nrow(data), function(i) l[[i]][j]*data[i,]/sum_P_Cj_Xi[j])))) #updating clusters means 
sigma=lapply(1:k, function(j) Reduce("+",lapply(1:nrow(data), function(i) l[[i]][j]*(data[i,]-mu[j,])%*%t(data[i,]-mu[j,])/sum_P_Cj_Xi[j]))) #updating clusters cov matrices         
print(list(mu,sigma,W_j)) #printing for debuggging
return(list(mu,sigma,W_j)) 
                                                
}                

max=6                            
t <-max  # number of total iterations
while (t <= max) {
if(t==0) break 
if(t==max){
mu1=mu
sigma=sigma_list 
w_j=w_k  
tmp=M_step(data=data,mu=mu1 ,sigma_list=sigma,W_k=w_j,n_clusters=k)
t=t-1      
}else{
print(c("iteration : ",t))         
mu1=tmp[[1]]
sigma=tmp[[2]]
w_j=tmp[[3]]   
tmp=M_step(data=data,mu=mu1 ,sigma_list=sigma,W_k=w_j,n_clusters=k) 
   
if(t==0) break                                   
t = t-1    
}    

                              
}

This implementation is based on the equations explained at part 2 of this work : https://arxiv.org/ftp/arxiv/papers/1603/1603.07879.pdf

My question is simple, According to the EM algorithm :

Are the obtained results/outputs of updated parameters (mu, sigma,w_k) correct / ( matched the em approach )?

I hope my question is simple and clear!

Thank you for help in advance !

0

There are 0 answers