I am training a neural network to calculate the inverse of a 3x3 matrix. I am using a Keras dense model with 1 layer and 9 neurons. The activation function on the first layer is 'relu' and linear on the output layer. I am using 10000 matrices of determinant 1. The results I am getting are not very good (RMSE is in the hundreds). I have been trying more layers, more neurons, and other activation functions, but the gain is very small. Here is the code:
import numpy as np
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
def generator(nb_samples, matrix_size = 2, entries_range = (0,1), determinant = None):
'''
Generate nb_samples random matrices of size matrix_size with float
entries in interval entries_range and of determinant determinant
'''
matrices = []
if determinant:
inverses = []
for i in range(nb_samples):
matrix = np.random.uniform(entries_range[0], entries_range[1], (matrix_size,matrix_size))
matrix[0] *= determinant/np.linalg.det(matrix)
matrices.append(matrix.reshape(matrix_size**2,))
inverses.append(np.array(np.linalg.inv(matrix)).reshape(matrix_size**2,))
return np.array(matrices), np.array(inverses)
else:
determinants = []
for i in range(nb_samples):
matrix = np.random.uniform(entries_range[0], entries_range[1], (matrix_size,matrix_size))
determinants.append(np.array(np.linalg.det(matrix)).reshape(1,))
matrices.append(matrix.reshape(matrix_size**2,))
return np.array(matrices), np.array(determinants)
### Select number of samples, matrix size and range of entries in matrices
nb_samples = 10000
matrix_size = 3
entries_range = (0, 100)
determinant = 1
### Generate random matrices and determinants
matrices, inverses = generator(nb_samples, matrix_size = matrix_size, entries_range = entries_range, determinant = determinant)
### Select number of layers and neurons
nb_hidden_layers = 1
nb_neurons = matrix_size**2
activation = 'relu'
### Create dense neural network with nb_hidden_layers hidden layers having nb_neurons neurons each
model = Sequential()
model.add(Dense(nb_neurons, input_dim = matrix_size**2, activation = activation))
for i in range(nb_hidden_layers):
model.add(Dense(nb_neurons, activation = activation))
model.add(Dense(matrix_size**2))
model.compile(loss='mse', optimizer='adam')
### Train and save model using train size of 0.66
history = model.fit(matrices, inverses, epochs = 400, batch_size = 100, verbose = 0, validation_split = 0.33)
### Get validation loss from object 'history'
rmse = np.sqrt(history.history['val_loss'][-1])
### Print RMSE and parameter values
print('''
Validation RMSE: {}
Number of hidden layers: {}
Number of neurons: {}
Number of samples: {}
Matrices size: {}
Range of entries: {}
Determinant: {}
'''.format(rmse,nb_hidden_layers,nb_neurons,nb_samples,matrix_size,entries_range,determinant))
I have checked online and there seem to be papers dealing with the problem of inverse matrix approximation. However, before changing the model I would like to know if there would be other parameters I could change that could have a bigger impact on the error. I hope someone can provide some insight. Thank you.
Inverting a 3x3 matrix is pretty difficult for a neural network, as they tend to be bad at multiplying or dividing activations. I wasn't able to get it to work with a simple dense network, but a 7 layer resnet does the trick. It has millions of weights so it needs many more than 10000 examples: I found that it completely memorized up to 100,000 samples and badly overfit even with 10,000,000 samples, so I just generated samples continuously and fed each sample to the network once as it was generated.