I have a binary classification problem for my neural network.
I already got good results using the ReLU activation function in my hidden layer and the sigmoid function in the output layer. Now I'm trying to get even better results. I added a second hidden layer with the ReLU activation function, and the results got even better. I tried to use the leaky ReLU function for the second hidden layer instead of the ReLU function and got even better results, but I'm not sure if this is even allowed.
So I have something like that: Hidden layer 1: ReLU activation function Hidden layer 2: leaky ReLU activation function Hidden layer 3: sigmoid activation function
I can't find many resources on it, and those I found always use the same activation function on all hidden layers.
If you mean the Leaky ReLU, I can say that, in fact, the Parametric ReLU (PReLU) is the activation function that generalizes the tradional rectified unit as well as the leaky ReLU. And yes, PReLU impoves model fitting with no significant extra computational cost and little overfitting risk.
For more details, you can check out this link Delving Deep into Rectifiers