Sorry for asking this silly question. I am experimenting with the Keras framework, and due to convergence issues in a much more involved set-up, I am now proceeding step-by-step.
I set up a very simple 1 node neural net with relu. Depending on how I set it up, however, the relu behaves as expected, or incorrectly as a linear identity mapping.
Solution 1: input node -> identity pass through to node with relu activation -> identity pass through to output node [black curve in picture below]
Solution 2: input node -> identity pass through to output node with relu activation [red = blue curve in picture below]
Solution 3: input node -> identity pass through -> relu activation -> identity pass through to output node [blue = red curve in picture below]
Any clue as to why solution 1 does not work? [red and blue curve overlap in the output picture below]
I find it worrying that the RELU function functions differently if put into the network at different position or in different ways.
NB: GELU/SIGMOID/etc do not seem to be affected by this issue; just set mm = "sigmoid" or mm = "gelu" below.
#### load libraries
library(tensorflow)
library(keras)
#### define a simple test grid
x = as_tensor(-5+10*(1:1e3)/1e3, dtype = tf$float32)
#### direct pass through of the input to output
dum1 = list(matrix(1,1,1), as.array(0, dim = 1))
mm = "relu"
#### does not work as planned; yields linear, not RELU ####
model <- keras_model_sequential(input_shape = c(1, 1)) %>%
layer_flatten() %>%
layer_dense(1, activation = mm, weights = dum1) %>%
layer_dense(1, weights = dum1)
plot(x,predict(model, x), type = "l", col = "black")
#### works as planned ####
model <- keras_model_sequential(input_shape = c(1, 1)) %>%
layer_flatten() %>%
layer_dense(1, activation = mm, weights = dum1)
lines(x,predict(model, x), type = "l", col = "red")
#### works as planned ####
model <- keras_model_sequential(input_shape = c(1, 1)) %>%
layer_flatten() %>%
layer_activation_relu() %>%
layer_dense(1, weights = dum1)
lines(x,predict(model, x), type = "l", col = "blue")
output picture of this code; red and blue overlap
I googled for different answers and manuals to no avail. Above is my issue stripped to the bear essentials.
For all weights being 1, this means that the output of the neuron (before ReLU) is just the sum of the inputs.
Now, let's consider the effect of ReLU on this sum of inputs:
Given this behavior, if the sum of the inputs (which is the sum of x, and since some x are negative) is negative, ReLU will set it to zero, but if the sum is positive, ReLU will leave it unchanged.
Hence, with weights set to 1, the ReLU activation in this scenario behaves as a linear transformation for positive or zero inputs and sets negative inputs to zero, which might give the appearance of a linear response across the range of input values.
Note: I do not have R on my computer, could you check whether this is the reason, by monitoring the sum of your inputs?