Do neural network layers with a constant input learn weights?

942 views Asked by At

I'm trying to make a network that has an input that lingers/decays. The raw input will be a vector with either 0,1 or -1 in for each element. I'm curious if there is any value in simultaneous activation of any given input so I would like to have a weight decay from 1 or -1 back to 0 rather than just being 0 the next iteration, a crude form of memory I guess. An example of what I'm trying to say:

Normal input:
1 -> 0 -> 0 -> -1 -> 0 ...
With decay .2:
1 -> .8 -> .6 -> -1 -> -.8 ...

This is easy to do manually by adding an extra input that takes a vector of decay values, but I want to know if it's possible to have the network learn it's own values here so that it can give smaller decays to inputs that are more important.

Since each neuron outputs one value it is possible to have N neurons (one for each required decay value) and then pass them 1 as a constant input so they would just output their weight which could be run through sigmoid activation and then used as the decay values.

Will this layer learn weights given it's input is always 1? If not is there a way to do this?

NOTES: The data is sequential which is why I would assume that the activations could affect each other. Also I am aware that recurrent networks are made to have memory but I don't know if I have enough data for it to learn the relationships. Also this custom decay function can eventually make it back to 0 because it subtracts decay, multiplying by a small weight would approach 0 asymptotically which, if I understand correctly, is what a RNN would do.

1

There are 1 answers

3
Aniket Bote On BEST ANSWER

You can create this type of architecture easily using TensorFlow functional API.

Creation of dataset and model Code:

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

# Generating features
np.random.seed(100)
x1 = tf.constant(np.ones(shape =(100,1)), dtype = tf.float32)
x2 = tf.constant(np.ones(shape =(100,1)), dtype = tf.float32)
x3 = tf.constant(np.ones(shape =(100,1)), dtype = tf.float32)
y = tf.constant(np.random.randint(2, size =(100,)), dtype = tf.float32)

def create_model():
    input1 = tf.keras.Input(shape=(1,))
    input2 = tf.keras.Input(shape=(1,))
    input3 = tf.keras.Input(shape=(1,))
    hidden1 = tf.keras.layers.Dense(units = 1, activation='sigmoid', use_bias = False)(input1)
    hidden2 = tf.keras.layers.Dense(units = 1, activation='sigmoid', use_bias = False)(input2)
    hidden3 = tf.keras.layers.Dense(units = 1, activation='sigmoid', use_bias = False)(input3)
    
    merge = tf.keras.layers.concatenate([hidden1,hidden2,hidden3])
    
    hidden4 = tf.keras.layers.Dense(units = 4, activation='sigmoid')(merge)
    output1 = tf.keras.layers.Dense(units = 2, activation='softmax')(hidden4)
    
    model = tf.keras.models.Model(inputs = [input1, input2, input3], outputs = output1, name= "functional1")
    
    return model
model = create_model()

# setting decay values
model.layers[3].set_weights([tf.constant([[0.8]])])
model.layers[4].set_weights([tf.constant([[0.8]])])
model.layers[5].set_weights([tf.constant([[0.8]])])

tf.keras.utils.plot_model(model, 'my_first_model.png', show_shapes=True)

Your model looks like this. Model

Training process:

# Instantiate an optimizer.
optimizer = tf.keras.optimizers.SGD(learning_rate=10)
# Instantiate a loss function.
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
epochs = 50
for epoch in range(epochs):
    print("\nStart of epoch %d" % (epoch,))


    # Open a GradientTape to record the operations run
    # during the forward pass, which enables auto-differentiation.
    with tf.GradientTape() as tape:

        # Run the forward pass of the layer.
        # The operations that the layer applies
        # to its inputs are going to be recorded
        # on the GradientTape.
        logits = model([x1,x2,x3], training=True)  # Logits for this minibatch

        # Compute the loss value for this minibatch.
        loss_value = loss_fn(y, logits)

    # Use the gradient tape to automatically retrieve
    # the gradients of the trainable variables with respect to the loss.
    grads = tape.gradient(loss_value, model.trainable_weights)
    print('Gradients of- Decay 1: {}  Decay 2: {}  Decay 3: {}'.format(grads[0].numpy()[0][0], grads[1].numpy()[0][0], grads[2].numpy()[0][0]))

    # Run one step of gradient descent by updating
    # the value of the variables to minimize the loss.
    optimizer.apply_gradients(zip(grads, model.trainable_weights))

    # Log every epochs.
    print("Training loss (for one batch) at epoch %d: %.4f" % (epoch, float(loss_value)))
    print('------------------------------')

Output:

Start of epoch 0
Gradients of- Decay 1: -0.001539231976494193  Decay 2: 0.0013862588675692677  Decay 3: -0.0024916294496506453
Training loss (for one batch) at epoch 0: 0.7312
------------------------------

Start of epoch 1
Gradients of- Decay 1: 0.0015823811991140246  Decay 2: -0.00021153852867428213  Decay 3: 0.0008941243286244571
Training loss (for one batch) at epoch 1: 0.7042
------------------------------

Start of epoch 2
Gradients of- Decay 1: -0.0013041968923062086  Decay 2: 0.0005898184608668089  Decay 3: -0.0015725962584838271
Training loss (for one batch) at epoch 2: 0.7039
------------------------------

Start of epoch 3
Gradients of- Decay 1: 0.00156548956874758  Decay 2: -0.00017016787023749202  Decay 3: 0.000881993502844125
Training loss (for one batch) at epoch 3: 0.7045
------------------------------

Start of epoch 4
Gradients of- Decay 1: -0.0012605276424437761  Decay 2: 0.00047704551252536476  Decay 3: -0.0015090997330844402
Training loss (for one batch) at epoch 4: 0.7028
------------------------------

Start of epoch 5
Gradients of- Decay 1: 0.0014193064998835325  Decay 2: -0.0001368212979286909  Decay 3: 0.0008420557714998722
Training loss (for one batch) at epoch 5: 0.7027
------------------------------

Start of epoch 6
Gradients of- Decay 1: -0.0011729025281965733  Decay 2: 0.0003637363843154162  Decay 3: -0.0013745202450081706
Training loss (for one batch) at epoch 6: 0.7011
------------------------------

Start of epoch 7
Gradients of- Decay 1: 0.0012617181055247784  Decay 2: -0.00010974107135552913  Decay 3: 0.0007924885721877217
Training loss (for one batch) at epoch 7: 0.7007
------------------------------

Start of epoch 8
Gradients of- Decay 1: -0.0010727590415626764  Decay 2: 0.000274341378826648  Decay 3: -0.0012277730274945498
Training loss (for one batch) at epoch 8: 0.6995
------------------------------

Start of epoch 9
Gradients of- Decay 1: 0.0011162457522004843  Decay 2: -8.809947757981718e-05  Decay 3: 0.0007380791357718408
Training loss (for one batch) at epoch 9: 0.6991
------------------------------

Start of epoch 10
Gradients of- Decay 1: -0.0009710552403703332  Decay 2: 0.00020754436263814569  Decay 3: -0.001086110481992364
Training loss (for one batch) at epoch 10: 0.6982
------------------------------

The final value of your decay rate.

print(model.layers[3].get_weights())
print(model.layers[4].get_weights())
print(model.layers[5].get_weights())

Output:

[array([[0.7963085]], dtype=float32)]
[array([[0.7707753]], dtype=float32)]
[array([[0.8614942]], dtype=float32)]

Things to remember-

Your learning not only depends on your input but also your output. While calculating the gradients which are shown above the output as well as predicted output term is present in the gradient equation. Therefore, as long as you have different output learning will still take place.