Experiencing memory consumption Increase while experimenting with custom architecture in PyTorch

53 views Asked by At

I'm currently training a custom architecture tailored for a specific problem, and so far, the architecture is performing flawlessly. However, I'm encountering a memory consumption issue while running an experiment to calculate the loss based on different attributes. Essentially, I'm selecting an attribute and computing the loss associated with that attribute.

The problem:

The main issue arises from the significant increase in memory consumption during the execution of this experiment. Despite my efforts to analyze the situation, I've been unable to identify the root cause of this memory spike.

Updated: After each iteration of an attribute combination, there is an increase in RAM consumption and this accumulates over time. Around the twentieth combination, the memory reaches maximum capacity and crashes. Since I'm rewriting the variables, this excessive consumption should not occur.

Here is part of my code:

prev_size = 1
for attribute_item in combined_attributes:
    NUM_FEATURES = len(attribute_item)
    print(attribute_item)
if NUM_FEATURES != prev_size:       
    reset_model_optimizer()
    prev_size = NUM_FEATURES

    start_time = time.time()
    for epoch in range(50):
        # Create a generator object for data batches
        data_gen = data_generator(trainloader)

        for i, examples in enumerate(data_gen, 0):
            (inputs, desired_output) = examples

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward
            output_predictions = model(inputs)

            # loss
            desired_output = desired_output.to(output_predictions.device)
            loss = torch.nn.MSELoss()(output_predictions, desired_output)

            # backward + optimize
            loss.backward()
            optimizer.step()

            # New 
            output_loss.append(loss.item())

    output_attribute.append(attribute_item)
    output_epoch.append(j for j in range(50))

    end_time = time.time()
    execution_time = end_time - start_time
    minutes = int(execution_time // 60)
    seconds = int(execution_time % 60)
    time_format = "{:02d}:{:02d}".format(minutes, seconds)
    print("Time elapsed:", time_format)

    output_attribute = []
    output_epoch = []
    output_loss = []

    del inputs  
    del desired_output
    del output_predictions
    del loss

Although I have already checked the memory consumption of each variable used in the code, I've not been able to identify the variable or part of the code responsible for the memory increase.

I would greatly appreciate any insights or assistance in troubleshooting this memory consumption problem. If necessary, the complete code is available for reference, here is my Google Colab: Google Colab

Thanks in advance for any help.

0

There are 0 answers