I have created a pandas table containing articles and I have applied the spacy Entity recognition model to extract entities. Now I would like to use those recognized entities to measure word co-occurrence. The issue is that the entities are saved in a column of a pandas dataframe that contains data of type: spacy.tokens.doc.Doc and I don't know how to transform this data into a co-occurrence matrix.
Thank you for the help!
import spacy import pandas as pd nlp = spacy.load("es_core_news_sm") data1['nlp'] = data1.TEXTO.apply(lambda x: nlp(x)) for article in data1['nlp']: items = [x.text for x in article.ents] print(Counter(items).most_common(3))
Right now I get the lists of 3 most common entities in each article, but I would like to apply word co-occurrence to the results and I don't know how to get from the spacy.tokens.doc.Doc to a matrix.