The SELU activation function (https://github.com/bioinf-jku/SNNs/blob/master/selu.py) requires the input to be normalized to have the mean value of 0.0 and the variance of 1.0. Therefore, I tried to apply tf.layers.batch_normalization
(axis=-1
) on the raw data to meet that requirement. The raw data in each batch have the shape of [batch_size, 15]
, where 15 refers to the number of features. The graph below shows the variances of 5 of these features returned from tf.layers.batch_normalization
(~20 epochs). They are not all close to 1.0 as expected. The mean values are not all close to 0.0 as well (graphs not shown).
How should I get the 15 features all normalized independently (I expect every feature after normalization will have mean = 0 and var = 1.0)?
After reading the original papers of batch normalization (https://arxiv.org/abs/1502.03167) and SELU (https://arxiv.org/abs/1706.02515), I have a better understanding of them:
batch normalization is an "isolation" procedure to ensure the input (in any mini-batch) to the next layer has a fixed distribution, therefore the so called "shifting variance" problem is fixed. The affine transform ( γ*x^ + β ) just tunes the standardized x^ to another fixed distribution for better expressiveness. For the simple normalization, we need to turn the
center
andscale
parameters toFalse
when callingtf.layers.batch_normalization
.Make sure the
epsilon
(still intf.layers.batch_normalization
) is set to at least 2 magnitudes less than the lowest magnitude of the all input data. The default value ofepsilon
is set to 0.001. For my case, some features have values as low as 1e-6. Therefore, I had to changeepsilon
to 1e-8.The inputs to SELU have to be normalized before feeding them into the model.
tf.layers.batch_normalization
is not designed for that purpose.