After tokenizing the dataset then we try to extract DistiBert Embeddings on our dataset(contains 5000 text records in dataframe) memory error occurred for the following code:
outputs = model(**tokenized_inputs0)
bert_embeddings = outputs.last_hidden_state
So, we divide the dataframe into list using following code :
list_train = [final_Data1[i:i+100] for i in range(0,final_Data1.shape[0],100)]
Now how to extract the DistilBERT Embeddings on the above list_train...????
How to apply following code for extracting DistilBERT Embeddings on the list...?
outputs = model(**tokenized_inputs0) bert_embeddings = outputs.last_hidden_state
You could use a for loop to iterate over the list of data you created. You will have to do something with your predictions to get them out of memory, like saving them to a file, otherwise you'll still run out of memory.