I have created a dataset of containing all points contained in a circle labelled as 0 and all points outside it as 1. I wanted to teach a simple neural network if it can learn the binary classification problem.

Now, I have dealt with XOR problem. That inspired me to use two hidden layers. Since circle is a conic like a pair of straight lines used in XOR problem, this makes some sense. Otherwise I'd have gone for single hidden layer. No issues there. I am having trouble interpreting the use case of number of units/neurons in each layer.

Obviously, I have experimented thusly.

```
def layer_experiment(first=2,second=1):
print("Layers' count : ",first," ",second)
# create model
model = Sequential()
model.add(Dense(first, input_dim=2, activation='relu'))
model.add(Dense(second, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(data[:4000,[0,1]],data[:4000,2],epochs=30, batch_size=20, verbose=0)
scores = model.evaluate(data[4000:,:2], data[4000:,2])
print("Scores : ",scores)
predictions = model.predict_proba(data[4000:,:2])
# predictions = predictions.argmax(axis=-1)
predictions = [[migrate(x[0])] for x in predictions]
dd = np.append(data[4000:,:2],predictions,axis=-1)
df = pd.DataFrame(dd)
fg = seaborn.FacetGrid(data=df,hue=2,aspect=1.61)
fg.map(plt.scatter,0,1).add_legend()
```

I was wondering why the change of different parameters, caused change in plots or more specifically, hyperplanes separating the data. More code here : https://colab.research.google.com/drive/14HYdrUxvc5REUdToFkfZQxElPq_0l7-g

Five unique points define a conic intersecting the plane (https://en.wikipedia.org/wiki/Five_points_determine_a_conic) so if each neuron models one the two co-ordinates of the point you could try 10.