I recently tried to visualize TextRank using code, but I realized that the terms in the graph are not lemmatized. Is there a way to fix the following code so that all words in textrank_df['parse'] are lemmatized? I checked the pipeline components and all required components are in place ('tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner'), so I'm really not sure where went wrong.
import pytextrank
import spacy
import scattertext as st
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe("textrank", last=True)
convention_df = textrank_df.assign(
parse=lambda textrank_df: textrank_df['Combined'].apply(nlp),
)
corpus = st.CorpusFromParsedDocuments(
convention_df,
category_col='Response Variable',
parsed_col='parse',
feats_from_spacy_doc=st.PyTextRankPhrases()).build()
I tried the following code1, but it shows: AttributeError: module 'pytextrank' has no attribute 'TextRank'. I think it might be something to do with the format after this alteration.
code 1
convention_df = textrank_df.assign( parse=lambda textrank_df: textrank_df['Combined'].apply(lambda x: [token.lemma_ for token in nlp(x)]))
I also tried code 2 which adds use_lemmas=True in PyTextRankPhrases() but did not work as well. The word is still presented in its original form.
code 2
corpus = st.CorpusFromParsedDocuments( convention_df, category_col='Response Variable', parsed_col='parse', feats_from_spacy_doc=st.PyTextRankPhrases(use_lemmas=True)).build()
I'm one of the authors of
PyTextRank
and I've tried out the code shown above.There are some issues with the usage of
scattertext
in that example. I don't think the linewould work correctly. There's no source text defined, from what I can see, and also the
textrank_df
variable is considered by Python as an undefined value.Is this code based on the example in
scattertext
?https://github.com/JasonKessler/scattertext/blob/master/demo_pytextrank.pyMy suggestion would be:
spaCy
pipeline.PyTextRank
pipeline forspaCy
configured and running the way you want it to work.scattertext
and debug that portion.Might also be good to ask Jason & co. from
scattertext
for what they'd recommend.