Creating Embedding Matrix for LSTM Model with BERT Feature Representations on Arabic Dataset

44 views Asked by At

I'm working on implementing an LSTM model for an Arabic dataset using BERT feature representations. I've utilized the 'asafaya/bert-base-arabic' model for this purpose:

bert_model = AutoModelForMaskedLM.from_pretrained('asafaya/bert-base-arabic')

Now, I'm facing the challenge of creating an embedding_matrix to be used in the subsequent statement:

`model_LSTM.add(Embedding(vocab_length, embedding_vector_features, weights=[embedding_matrix], input_length=length_long_sentence))

Given that BERT provides contextual embeddings, the feature representation for the same word varies based on context.

I would appreciate any guidance or suggestions on how to effectively create the embedding matrix for this scenario. Thank you!

I tried the following

`def bert_embedding_matrix():
 bert = AutoModelForMaskedLM.from_pretrained("asafaya/bert-base-arabic",
 output_hidden_states = True,)
 bert_embeddings = list(bert.children())[0]
 bert_word_embeddings = list(bert_embeddings.children())[0]
 
 mat = bert_word_embeddings.word_embeddings.weight
return mat

embedding_matrix = bert_embedding_matrix()

`

but I have the following error ValueError: Layer embedding_1 weight shape (8155, 300) is not compatible with provided weight shape torch.Size([32000, 768]).

0

There are 0 answers