I want to classify actions from videos. For this purpose, K-means clustering is applied to optical flow points to generate a codebook.

With k=200, accuracy is initially 85%. After adding training data, accuracy is 50%. If k=400, accuracy is back to 85%.

How can the value of k be automatically optimized for my training data?

KMeans clustering will reduce your MSE and the answer for an optimum number of clusters is "It Depends". You can use elbow method to find optimum number of clusters. Here is one link that you can go through to find more details.