Why is softmax function necessory? Why not simple normalization?

Question

Why is softmax function necessory? Why not simple normalization?

2.6k views Asked by soshi shimada At 30 August 2017 at 16:47

I am not familiar with deep learning so this might be a beginner question. In my understanding, softmax function in Multi Layer Perceptrons is in charge of normalization and distributing probability for each class. If so, why don't we use the simple normalization?

Let's say, we get a vector x = (10 3 2 1) applying softmax, output will be y = (0.9986 0.0009 0.0003 0.0001).

Applying simple normalization (dividing each elements by the sum(16)) output will be y = (0.625 0.1875 0.125 0.166).

It seems like simple normalization could also distribute the probabilities. So, what is the advantage of using softmax function on the output layer?

Original Q&A

There are 2 answers

Prune On 30 August 2017 at 17:51

This depends on the training loss function. Many models are trained with a log loss algorithm, so that the values you see in that vector estimate the log of each probability. Thus, SoftMax is merely converting back to linear values and normalizing.

The empirical reason is simple: SoftMax is used where it produces better results.

**Dr. Snoopy** · Accepted Answer · 2017-08-30T19:55:49+00:00

Normalization does not always produce probabilities, for example, it doesn't work when you consider negative values. Or what if the sum of the values is zero?

But using exponential of the logits changes that, it is in theory never zero, and it can map the full range of the logits into probabilities. So it is preferred because it actually works.

TechQA.

Why is softmax function necessory? Why not simple normalization?

There are 2 answers

Related Questions in NEURAL-NETWORK

Related Questions in DEEP-LEARNING

Related Questions in SOFTMAX

Popular Questions

Trending Questions