I have created a pandas table containing articles and I have applied the spacy Entity recognition model to extract entities. Now I would like to use those recognized entities to measure word co-occurrence. The issue is that the entities are saved in a column of a pandas dataframe that contains data of type: spacy.tokens.doc.Doc and I don't know how to transform this data into a co-occurrence matrix.

Thank you for the help!

import spacy
import pandas as pd

nlp = spacy.load("es_core_news_sm")

data1['nlp'] = data1.TEXTO.apply(lambda x: nlp(x))

for article in data1['nlp']:
   items = [x.text for x in article.ents]
   print(Counter(items).most_common(3))

Right now I get the lists of 3 most common entities in each article, but I would like to apply word co-occurrence to the results and I don't know how to get from the spacy.tokens.doc.Doc to a matrix.

0 Answers