Number of neurons in dense layer in CNN

5.6k views Asked by At

I want to ask you a question about number of neurons used in dense layers used in CNN. As much as i seen generally 16,32,64,128,256,512,1024,2048 number of neuron are being used in Dense layer. So is descending vs ascending order better before the output layer?

For example

model.add(Dense(2048,kernel_regularizer='l2' ,activation='relu'))
model.add(Dense(1024,kernel_regularizer='l2' ,activation='relu'))
model.add(Dense(512,kernel_regularizer='l2' ,activation='relu'))
model.add(Dense(128,kernel_regularizer='l2' ,activation='relu'))

or

model.add(Dense(128,kernel_regularizer='l2' ,activation='relu'))
model.add(Dense(512,kernel_regularizer='l2' ,activation='relu'))
model.add(Dense(1024,kernel_regularizer='l2' ,activation='relu'))
model.add(Dense(2048,kernel_regularizer='l2' ,activation='relu'))

Could please give an answer with explanation as well? Thank you

2

There are 2 answers

1
Hossein On BEST ANSWER

TLDR:

You can use either of them really. but it depends on many creteria.

Semi Long Explanation:

You can use either of those, but they impose different implications. Basically you want your number of neurons to increase as the size of your featuremap decreases, in order to retain nearly the same representational power. its also the case, when it comes to the developing more abstract features which I'll talk about shortly.
This is why you see in a lot of papers, they start with a small number at the start of the network and gradually increase it.
The intution behind this is that early layers deal with primitive concepts and thus having a large amount of neurons wouldn't really benifit after some point, but as you go deeper, the heierarchy of abstractions get richer and richer and you'd want to be able to capture as much information as you can and create new /higher/richer abstaractions better. This is why you increase the neurons as you go deeper.

On the other hand, when you reach the end of the network, you'd want to choose the best features out of all the features you have so far developed, so you start to gradually decrease the number of neurons so hopefully you'll end up with the most important features that matters to your specific task.

Different architectural designs, have different implications and are based on different intutions about the task at hand. You need to choose the best strategy based on your needs.

1
Prajot Kuvalekar On

There's no such rule of descending vs ascending. but mostly people follow descending, But try to keep greater number of neuron in your fc part than your last classification neurons

if you see VGG16 arch, the last layers are in this order: 4096 ,4096 ,1000.so here 1000 is the no. of classes in imagenet dataset.

In your case you can follow this:

model.add(Dense(2048,kernel_regularizer='l2' ,activation='relu'))
model.add(Dense(1024,kernel_regularizer='l2' ,activation='relu'))
model.add(Dense(512,kernel_regularizer='l2' ,activation='relu'))
model.add(Dense(128,kernel_regularizer='l2' ,activation='relu'))
model.add(Dense(number_classes ,activation='softmax'))