I am trying to implement evolutionary algorithm using MAP-Elites evolutionary strategy and polynomial mutation operator (as defined here at point 2.) as discussed in this paper. I created a three different MNIST models and trained them using tensorflow==2.0.0a using Keras API. The models are performing fine (all around 95% accuracy).

My understanding of the mentioned evolutionary strategy is that we create a starting population and then with each iteration mutate a randomly chosen specimen from that population. If after the mutation the new specimen gets the certainty of belonging to any class higher than a previously chosen best specimen for that class then we treat it as the best and add it to population. The expected result is that after algorithm is finished we should have a specimen for each of the classes that manages to get classified within that class with high certainty. The initial population of images is created is created using uniform distribution.

The issue is that my models classify random input created with uniform distribution always as the same class with high certainty (i.e CNN model always classifies it as 8). So most or all specimens i end up with are being classified as the same class (with slightly varying certainties of belonging to other classes) even after big starting population and number of iterations (i.e 1000 starting specimens and 20000 iterations).

Input samples are normalized to range [0.0, 1.0]. All of the reasoning bellow is constrained for simplification sake just to Dense model (CNN, and simplified LeNet5 yield similar results) described at the bottom.

Using normal distribution with mean=0.0 and stddev=0.3 or mean=0.5 and stddev=0.3 for generating starting population and mutation chance of 0.3 (instead of 0.1 as in paper) yields similar results.

I tried using (1 , λ) evolutionary strategy with targeting just one class (starting population 100, 100 generations) and it yields better results than the MAP-Elites implemented bellow (i can generate specimen for more than one class).

I tried not normalizing the data for model and training it again using [0, 255] range but the results were almost the same. I also tried using Gaussian mutation operator instead of polynomial but it didn't seem to make much of a difference.

Turning off data augmentation when training doesn't seem to have effect.

Here is the implementation i wrote.

```
def evolutionary_attack(model, population_size, generations_nmb, min=0.0, max=1.0, mutation_chance=0.1, mutation_power=15):
population = [] #
best_in_class = {} #dictionary of specimen performing best for given class
for x in range(population_size):
population.append(np.random.uniform(min, max, model.get_input_shape())) #initialize specimens with random values
# population.append(np.random.normal(0.0, 0.3, model.get_input_shape())) #initialize specimens with random values
for i in range(generations_nmb):
current_specimen = random.choice(population) #choose specimen at random from the population
mutated_specimen = mutate_specimen(current_specimen, min, max, mutation_chance, mutation_power)
logits = model(tf.expand_dims(tf.convert_to_tensor(mutated_specimen, dtype=tf.float32), 0))
certainties_per_class = tf.squeeze(logits).numpy()
for cur_class in range(len(certainties_per_class)):
if cur_class in best_in_class:
_, best_certainty = best_in_class[cur_class]
if certainties_per_class[cur_class] > best_certainty:
#if the new specimen performs better in given class make it a new best and add it to the population
best_in_class[cur_class] = (mutated_specimen, certainties_per_class[cur_class])
population.append(mutated_specimen)
else:
best_in_class[cur_class] = (mutated_specimen, certainties_per_class[cur_class]) #handles the case when there is no best specimen for the given class
def mutate_specimen(specimen, min, max, mutation_chance, mutation_power=15):
specimen = np.copy(specimen)
with np.nditer(specimen, op_flags=['readwrite']) as it:
for old_val in it:
if np.random.uniform() < mutation_chance:
u = np.random.uniform()
if u <= 0.5:
delta = ((2.0 * u) ** (1.0 / (mutation_power))) - 1.0
new_val = old_val + delta * (old_val - min)
else:
delta = 1.0 - (2 * (1 - u))** (1 / (1 + mutation_power))
new_val = old_val + delta * (max - old_val)
old_val[...] = new_val
return np.clip(specimen, min, max)
```

In the paper authors state that they were able to get specimens for each digit classified with >99.99% confidence after 50 generations. This is vastly different from the results that i get. It seems that i am doing something wrong but i am unable to pinpoint the issue. I am not sure whether it is just some small bug in the code or is my reasoning behind implementation wrong.

My model is constructed like this

# DenseModel (sigmoid activation functions on all layers except of last one)

input_1 (InputLayer) [(None, 28, 28, 1)] 0

flatten (Flatten) (None, 784) 0

dense (Dense) (None, 784) 615440

dense_1 (Dense) (None, 800) 628000

dense_2 (Dense) (None, 800) 640800

dense_3 (Dense) (None, 10) 8010

It have been trained for multiple epochs with data augmentation using Adam optimizer.

EDIT: i just noticed that i am not clipping values of specimens after mutation. If i do that then using normal distribution yields similar results to using uniform distribution. i fixed this in posted code. stupid mistake

I think you have misinterpreted MAP-Elites. You are tracking an append-only population separately from the elites, but according to this reference you are supposed to track only the elites. So I think the line to select the current specimen should be:

So the maximum population size would be 10 for MNIST. I'm not 100% sure that this is your main problem, but it should at least make the algorithm more greedy, moving search away from the oldest solutions.

## Modification of MAP-Elites

Thinking about this some more, I'm not sure why this method should work on MNIST with direct pixel parameters. It would be very unlikely to generate a different digit just by random mutation, especially not after a few steps of optimization towards that same class, with all the other classes remaining empty.

It seems your implementation is actually doing this part correctly according to the paper:

But in the original MAP-Elites, the goal was to track a single global fitness metric over the whole population. It was possible that some cells remained empty because no solution was ever mapped to this cell. The paper actually tracks a different fitness metric for each cell. This should have been stated more prominently as a modification of MAP-Elites. With this modification it seems plausible that this should work, but it no longer makes sense to allow empty cells.