How do I train word embeddings within a large block of custom text using BERT?

Question

How do I train word embeddings within a large block of custom text using BERT?

842 views Asked by Charles At 05 October 2020 at 02:28

I found a great tutorial to generate contextualized word embedding for a custom sentence here: http://mccormickml.com/2019/05/14/BERT-word-embeddings-tutorial/

However, it does not tell me how to train this on a larger paragraph. I have around 1,000 tokens that I want the model to learn. How can I adapt the link's code and apply it to a whole paragraph, so that each word learns the context from the whole document?

Original Q&A

There are 1 answers

**polm23** · Answer 1 · 2020-10-05T03:41:38+00:00

The tutorial you link to currently uses Huggingface Transformers. According to the authors, their BERT model is limited to 512 tokens. If you want to process longer sentences you'll need to train your own BERT from scratch.

Note that in general getting good embeddings for long documents is still an area of active research and you won't get good results just by changing some numbers in a configuration file.

TechQA.

How do I train word embeddings within a large block of custom text using BERT?

There are 1 answers

Related Questions in MACHINE-LEARNING

Related Questions in NLP

Related Questions in DATA-SCIENCE

Related Questions in WORD-EMBEDDING

Related Questions in BERT-LANGUAGE-MODEL

Popular Questions

Popular Tags

Trending Questions