Ho to Extract DistilBERT Embeddings from a list containg 5000 records..?

Question

Ho to Extract DistilBERT Embeddings from a list containg 5000 records..?

49 views Asked by Prasad Joshi At 21 September 2023 at 11:58

After tokenizing the dataset then we try to extract DistiBert Embeddings on our dataset(contains 5000 text records in dataframe) memory error occurred for the following code:

outputs = model(**tokenized_inputs0)
bert_embeddings = outputs.last_hidden_state

So, we divide the dataframe into list using following code :

list_train = [final_Data1[i:i+100] for i in range(0,final_Data1.shape[0],100)]

Now how to extract the DistilBERT Embeddings on the above list_train...????

How to apply following code for extracting DistilBERT Embeddings on the list...?

outputs = model(**tokenized_inputs0) bert_embeddings = outputs.last_hidden_state

Original Q&A

There are 1 answers

**Jesse Sealand** · Answer 1 · 2023-09-21T12:33:16+00:00

You could use a for loop to iterate over the list of data you created. You will have to do something with your predictions to get them out of memory, like saving them to a file, otherwise you'll still run out of memory.

batch_list = [final_Data1[i:i+100] for i in range(0,final_Data1.shape[0],100)] 

for batch in batch_list:

    outputs = model(**batch)
    bert_embeddings = outputs.last_hidden_state

    # do something with your outputs

TechQA.

Ho to Extract DistilBERT Embeddings from a list containg 5000 records..?

There are 1 answers

Related Questions in LIST

Related Questions in EXTRACT

Related Questions in TOKENIZE

Related Questions in EMBEDDING

Related Questions in DISTILBERT

Popular Questions

Trending Questions