I have implemented an autoencoder that should realize a non-linear version of a principal component analysis. In- and Output of the model is a the same dataset with n features and I am interested in the encoding which has dimension d<n. To generalize the principal component analysis I would like to have an encoding that consists of d almost linearly independent vectors, but if I use the loss function "mse" I get e.g. for d=2 two vectors which look almost the same.
Theoretically I could use a loss function including a penalty term for vector that are similar and far from independent. But that would mean to have a loss function that uses the information of the whole batch not just a single sample and not from the output but from an intermediate layer.
Since I am working with Keras: Can anyone give me hint or a reference how I can approach this problem in Keras?