Here is the start of my config file:

# This is an auto-generated partial config. To use it with 'spacy train'
# you can run spacy init fill-config to auto-fill all default settings:
# python -m spacy init fill-config ./base_config.cfg ./config.cfg
[paths]
train = null
dev = null
vectors = "en_core_web_lg"

But I am training my model on my own labels and spans. What are the vectors = "en_core_web_lg" used for?

After all I am using the following logic to train my model:

# Load a new spacy model:
nlp = spacy.blank("en")
# Create a DocBin object:
db = DocBin()
for text, annotations in input: # Data in previous format
    doc = nlp(text)
    ents = []
    spans = []
    for start, end, label in annotations: # Add character indexes
        spans.append(Span(doc, 0, len(doc), label=label))
        span = doc.char_span(start, end, label=label)
        ents.append(span)
        doc.ents = ents # Label the text with the ents
        group = SpanGroup(doc, name="sc", spans=spans)
        doc.spans["sc"] = group
        db.add(doc)
db.to_disk(output_path)

Please explain where these vectors are used in such configuration? Consider I have a list of annotated data in the format of [(text_1, [(start_1, end_1, LABEL_1)]), (text_2, [(start_2, end_2, LABEL_1)]).....]

1

There are 1 answers

0
aab On BEST ANSWER

The static word vectors are included as a tok2vec feature if you have include_static_vectors = true in the tok2vec config.