Reproducibility of the results for GNN using DGL grahSAGE

Question

Reproducibility of the results for GNN using DGL grahSAGE

303 views Asked by Luisa Roa At 12 September 2024 at 20:39

I'm working on a node classification problem using graphSAGE. I'm new to GNN so my code is based on the tutorials of GraphSAGE with DGL for classification task [1] and [2]. This is the code that I'm using, its a 3 layer GNN with imput size 20 and output size 2 (binary classification problem):

class GraphSAGE(nn.Module):
    def __init__(self,in_feats,n_hidden,n_classes,n_layers,
                 activation,dropout,aggregator_type):
        super(GraphSAGE, self).__init__()
        self.layers = nn.ModuleList()
        self.dropout = nn.Dropout(dropout)
        self.activation = activation

        self.layers.append(dglnn.SAGEConv(in_feats, n_hidden, aggregator_type))
        for i in range(n_layers - 1):
            self.layers.append(dglnn.SAGEConv(n_hidden, n_hidden, aggregator_type))
        self.layers.append(dglnn.SAGEConv(n_hidden, n_classes, aggregator_type))

    def forward(self, graph, inputs):
        h = self.dropout(inputs)
        for l, layer in enumerate(self.layers):
            h = layer(graph, h)
            if l != len(self.layers) - 1:
                h = self.activation(h)
                h = self.dropout(h)
        return h

modelG = GraphSAGE(in_feats=n_features, #20
                   n_hidden=16,
                   n_classes=n_labels, #2
                   n_layers=3,
                   activation=F.relu,
                   dropout=0,
                   aggregator_type='mean')

opt = torch.optim.Adam(modelG.parameters())

for epoch in range(50):
    modelG.train() 

    logits = modelG(g, node_features)
    
    loss = F.cross_entropy(logits[train_mask], node_labels[train_mask])
    
    acc = evaluate(modelG, g, node_features, node_labels, valid_mask)
    
    opt.zero_grad()
    loss.backward()
    opt.step()
    
    if epoch % 5 == 0:
        print('In epoch {}, loss: {}'.format(epoch, loss),)

Every time I train the model (without changing anything), the performance changes a lot, the acurracy varies between 0.45 and 0.87. How can I guarantee the reproducibility of the results? I have tried setting the pytorch seed torch.manual_seed(), numpy seed and set the drop out to 0 but the results keep varying. Is this normal or am I missing something?

Original Q&A

There are 1 answers

**Ana** · Answer 1 · 2023-11-07 21:25:38

I saw similar issues online and the solution was to use:

torch.set_deterministic(True)

The reason maybe because the scatter operations since it uses atomic operations under the hood, therefore, the ordering of summation may vary every time you run the code.

also when you run the code set this in the terminal before calling your script as follows:

CUBLAS_WORKSPACE_CONFIG=:16:8 python file_name.py

reference: https://pytorch.org/docs/stable/notes/randomness.html https://github.com/pyg-team/pytorch_geometric/issues/859

TechQA.

Reproducibility of the results for GNN using DGL grahSAGE

There are 1 answers

Related Questions in PYTHON

Related Questions in GRAPH

Related Questions in PYTORCH

Related Questions in DGL

Popular Questions

Popular Tags

Trending Questions