I am new to Pytorch and I am trying to build a graph neural network with torch geometric. My data is basically a time series of graphs. Let's say I have time=1000 such graphs, each with num_nodes=246 nodes and num_features=185 features per node. So my data has the shape [246,185,1000]. The adjacency matrix is the same for all 1000 graphs. It never changes. Neither does the amount of features. All I want to predict are the new values of the features of graph at time t using graph at time t-1 (or more, see further below). I build a model that looks like this:
class GNNModel(nn.Module):
def __init__(self, num_features, hidden_channels):
super(GNNModel, self).__init__()
# Convolutional Message Passing Layers
self.conv1 = GCNConv(num_features, hidden_channels[0])
self.conv2 = GCNConv(hidden_channels[0], hidden_channels[1])
self.conv3 = GCNConv(hidden_channels[1], hidden_channels[2])
self.conv4 = GCNConv(hidden_channels[2], hidden_channels[3])
self.conv5 = GCNConv(hidden_channels[3], num_features)
# Dense layer for regression
self.fc = nn.Sequential(nn.Linear(num_features, 32), nn.ReLU(), nn.Linear(32, 1))
def forward(self, x, edge_index):
# Message Passing Layers (GCNConv)
x = self.conv1(x, edge_index)
x = F.relu(x)
x = self.conv2(x, edge_index)
x = F.relu(x)
x = self.conv3(x, edge_index)
x = F.relu(x)
x = self.conv4(x, edge_index)
x = F.relu(x)
x = self.conv5(x, edge_index)
x = F.relu(x)
x = self.fc(x)
return x
I define my X=data[:,:,:-1] and Y=data[:,:,1:], so basically Y[:,:,0]=X[:,:,1] is the label for X[:,:,0]. As it is a time series of graphs. I played around a bit with the amount of GCNConv Layers, the hidden channels, the amount of dense layers in the end and so on and I found that interestingly if I delete the last two lines in the forward function (x=F.relu(x) and x=self.fc(x)) the results are much better. In fact, with the fully connected layers in the end the predictions after training don't even make any sense. So my first question is, if anyone has an explanation why and an advise what I am doing wrong.
I also wanted to structure my training data the following: X_0 = data[:,:,:n_past], Y_0 = data[:,:,n_past]. So for example the first 5 graphs should be used to predict the 6th graph. And so on. But then my X_train would have the shape [n_past,246,185,1000-n_past-1]. And I couldn't find a way to make GCNConv layers accept input with 4 dimensions (or three without counting the batch). So I thought I could just make my input samples be graphs with num_nodes_x = 1230 (= 5246) nodes by stacking up graph_(t-5), graph_(t-4), graph_(t-3), graph_(t-2) and graph_(t-1) to be one input sample and the label would be graph_(t). However the next question was what to do with the adjacency matrix then. I stacked it together to be a 5 x 5 block matrix consisting of 25 instances of the original adjacency matrix. Now I had the problem that my adjacency matrix for inputs x was obviously much larger than the adjacency matrix for outputs y. And my output graphs suddenly also had 1230 nodes, instead of just 246. I managed to convince my model to deal with that issue by reshaping the output of the last GCNConv layer from [num_nodes_x, num_features] = [1230, 185] to [num_nodes_y, 5num_features] = [246,925] and then reducing the 925 back to 185 in the fully connected layers. However the outputs of this model after training looked horrible. And one could see this structure of 5 x 5 blocks when plotting via
Y_pred = model(X_test[:,:,0], edge_index)
plt.scatter(x,y,c=Y_pred[:,k])
So I think that reshaping was not a good idea. Does anyone have a good idea what I could try out instead if I want to use several past time steps to predict the current time step? Or maybe you guys say it's entirely unnecessary?! However I still find it weird, that the fully connected layers in the end seem to make everything worse. I would be really happy about some input.