Why Vgg16 uses relu after each convolution layer?

38 views Asked by At

In the CS231N course, it says we want zero-centered data to prevent the local gradient from always being the same sign of upstream gradient coming down, which causes inefficient gradient updates. But using relu in each layer is gonna output all positive numbers, how to solve the inefficient gradient update problem?

0

There are 0 answers