I am trying to fine tune a SBERT from Hugging Face using TSDAE.
I rent a server with a RTX A6000 GPU with with 48Gb.
I am reading chunks of the text file (train set) and, for each chunk, extracting sentences.
I have a loop where I am submitting for training 1000 sentences each time. Than, I save the model, iterate the loop, read the next 1000 sentences, reload the model from the disk and train again.
For the first 5/6 times the training occurs without problem. Than it fails issuing "Out of memory", even I am cleaning the cache (torch.cuda.empty_cache()) before each new phase.
The code is the following:
def train_denoising(train_sentences,modelName):
torch.cuda.empty_cache()
# os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:22"
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
word_embedding_model = models.Transformer(modelName)
# Apply **cls** pooling to get one fixed sized sentence vector
pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(), 'cls')
model = SentenceTransformer(modules=[word_embedding_model, pooling_model])
#model = SentenceTransformer(modelName)
train_dataset = datasets.DenoisingAutoEncoderDataset(train_sentences)
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, drop_last=True)
train_loss = losses.DenoisingAutoEncoderLoss(model, decoder_name_or_path=modelName, tie_encoder_decoder=True)
model.fit(
train_objectives=[(train_dataloader, train_loss)],
epochs=num_epochs,
weight_decay=0,
scheduler='constantlr',
optimizer_params={'lr': 3e-5},
show_progress_bar=True,
# checkpoint_path=model_output_path,
use_amp=False, # Set to True, if your GPU supports FP16 cores
output_path='./yobb_model')
return model
num_sentences = 1000
for path in paths:
with gzip.open(path, 'rt', encoding='utf8') if path.endswith('.gz') else open(path, encoding='utf8') as f:
for piece in read_in_chunks(f, chunk_size=500*1024):
aux = [line.lower() for line in splitter.split(piece) if len(line) > 10]
count = len(aux)//num_sentences
index = 0
# iterate over the sentences getting <num_sentences> each time
for i in range(count):
train_sentences.extend(aux[index:index+num_sentences])
index += num_sentences
if(len(train_sentences) <= 0):
continue
print("Numero de sentencas {}".format(len(aux)))
logging.info("{} train sentences".format(len(train_sentences)))
train_denoising(train_sentences,"./yobb_model")
train_sentences.clear()
count = len(aux) % num_sentences
if(count > 0):
print("Numero de sentencas {}".format(len(aux[-count:])))
logging.info("{} train sentences".format(len(aux[-count:])))
train_denoising(aux[-count:],"./yobb_model")
The error is the classic one:
OutOfMemoryError: CUDA out of memory. Tried to allocate 920.00 MiB (GPU 0; 47.54 GiB total capacity; 43.35 GiB already allocated; 517.88 MiB free; 46.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
The batch size is 16.
I have tried this setting after getting Out Of Memory in several different attempts.
Why, even have executed for 1000 sentences in a previous iteration, PyTorch fails?
Why can't it release the memory of the GPU since I am initiating a new training process?