I'm currently training a custom architecture tailored for a specific problem, and so far, the architecture is performing flawlessly. However, I'm encountering a memory consumption issue while running an experiment to calculate the loss based on different attributes. Essentially, I'm selecting an attribute and computing the loss associated with that attribute.
The problem:
The main issue arises from the significant increase in memory consumption during the execution of this experiment. Despite my efforts to analyze the situation, I've been unable to identify the root cause of this memory spike.
Updated: After each iteration of an attribute combination, there is an increase in RAM consumption and this accumulates over time. Around the twentieth combination, the memory reaches maximum capacity and crashes. Since I'm rewriting the variables, this excessive consumption should not occur.
Here is part of my code:
prev_size = 1
for attribute_item in combined_attributes:
NUM_FEATURES = len(attribute_item)
print(attribute_item)
if NUM_FEATURES != prev_size:
reset_model_optimizer()
prev_size = NUM_FEATURES
start_time = time.time()
for epoch in range(50):
# Create a generator object for data batches
data_gen = data_generator(trainloader)
for i, examples in enumerate(data_gen, 0):
(inputs, desired_output) = examples
# zero the parameter gradients
optimizer.zero_grad()
# forward
output_predictions = model(inputs)
# loss
desired_output = desired_output.to(output_predictions.device)
loss = torch.nn.MSELoss()(output_predictions, desired_output)
# backward + optimize
loss.backward()
optimizer.step()
# New
output_loss.append(loss.item())
output_attribute.append(attribute_item)
output_epoch.append(j for j in range(50))
end_time = time.time()
execution_time = end_time - start_time
minutes = int(execution_time // 60)
seconds = int(execution_time % 60)
time_format = "{:02d}:{:02d}".format(minutes, seconds)
print("Time elapsed:", time_format)
output_attribute = []
output_epoch = []
output_loss = []
del inputs
del desired_output
del output_predictions
del loss
Although I have already checked the memory consumption of each variable used in the code, I've not been able to identify the variable or part of the code responsible for the memory increase.
I would greatly appreciate any insights or assistance in troubleshooting this memory consumption problem. If necessary, the complete code is available for reference, here is my Google Colab: Google Colab
Thanks in advance for any help.