How to continue training DETR model from last epoch using checkpoints?

102 views Asked by At

How can I continue training DETR with checkpoints from last epoch? I am using google colab and I can't train on 200 epoch all at once. This is my code for training:

from pytorch_lightning.callbacks import EarlyStopping
from pytorch_lightning import Trainer

# Define your DETR model, dataset, and other necessary elements
MAX_EPOCHS = 200

early_stopping_callback = EarlyStopping(
    monitor='training_loss',  # Monitor validation AP
    min_delta=0.00,  # Minimum change in AP
    patience=3,  # Number of epochs to wait for improvement before stopping
    mode='max'  # Consider AP as a maximization metric
)

trainer = Trainer(
    devices=1,
    accelerator="gpu",
    max_epochs=MAX_EPOCHS,
    gradient_clip_val=0.1,
    accumulate_grad_batches=8,
    log_every_n_steps=5,
    callbacks=[early_stopping_callback]
)

trainer.fit(model)

I tried this code to call the last checkpoint and the model but it didn't work.

1

There are 1 answers

0
Anna Andreeva Rogotulka On

you can load latest checkpoint into model

model.load_state_dict(checkpoint['model'])

and don't forget restore the states of optimizer and scheduler for stable learning

optimizer.load_state_dict(checkpoint['optimizer'])
lr_scheduler.load_state_dict(checkpoint['lr_scheduler'])
start_epoch = checkpoint['epoch'] + 1