SPACY - Confusion about word vectors and tok2vec

6.4k views Asked by At

it would be really helpful for me if you would help me understand some underlying concepts about Spacy.

I understand some spacy models have some predefined static vectors, for example, for the Spanish models these are the vectors generated by FastText. I also understand that there is a tok2vec layer that generates vectors from tokens, and this is used for example as the input of the NER components of the model.

If the above is correct, then I have some questions:

  • Does the NER component also use the static vectors?
    • If yes, then where does the tok2vec layer comes into play?
    • If no, then is there any advantage on using the lg or md models if you only intend to use the model for e.g. the NER component?
  • Is the tok2vec layer already trained for pretrained downloaded models, e.g. Spanish?
  • If I replace the NER component of a pretrained model, does it keep the tok2vec layer untouched i.e. with the learned weights?
  • Is the tok2vec layer also trained when I train a NER model?
  • Would the pretrain command help the tok2vec layer learn some domain-specific words that may be OOV?

Thanks a lot!

1

There are 1 answers

1
Sofie VL On BEST ANSWER

Does the NER component also use the static vectors?

This is addressed in point 2 and 3 of my answer here.

Is the tok2vec layer already trained for pretrained downloaded models, e.g. Spanish?

Yes, the full model is trained, and the tok2vec layer is a part of it.

If I replace the NER component of a pretrained model, does it keep the tok2vec layer untouched i.e. with the learned weights?

No, not in the current spaCy v2. The tok2vec layer is part of the model, if you remove the model, you also remove the tok2vec layer. In the upcoming v3, you'll be able to separate these so you can in fact keep the tok2vec model separately, and share it between components.

Is the tok2vec layer also trained when I train a NER model?

Yes - see above

Would the pretrain command help the tok2vec layer learn some domain-specific words that may be OOV?

See also my answer at https://stackoverflow.com/a/63520262/7961860

If you have further questions - happy to discuss in the comments!