Apply POS tag to nested list

Question

Apply POS tag to nested list

151 views Asked by stripes 123 At 24 January 2023 at 15:44

I'm trying to go through multiple sentences in a text. Each sentence is stored in nested list structure (i.e. a list of each sentence in the form of a list). I then want to apply POS tag to each 'token' in the sentence and store in another nested list structure. Ultimately this is so that I can add this to a dataframe and export to excel in 1 column (where each row is a sentence).

The trouble I'm having is the POS tag list only seems to capture the last sentence in the text. Here is part of the code.

for sentences in doc1.sents: #iterates over sentences in doc
     for match_id, start, end in phrase_matcher(nlp(sentences.text)):  
          if nlp.vocab.strings[match_id] in ["key"]: 
          found_sentences = sentences.text
          duplicate_sentence_list.append(found_sentences)                        
      all_separated_words_list.append(text_preprocessing(found_sentences))
          tokens = nltk.word_tokenize(sentence)
          tags = nltk.pos_tag(tokens)
          pos_list.append(tags)

When I try adding the POS tag to a for loop like below:

for sentences in doc1.sents: #iterates over sentences in doc
     for match_id, start, end in phrase_matcher(nlp(sentences.text)):  
          if nlp.vocab.strings[match_id] in ["key"]: 
          found_sentences = sentences.text
          duplicate_sentence_list.append(found_sentences)                        
          all_separated_words_list.append(text_preprocessing(found_sentences))
          for i in found_sentences:
              pos_list.append(nltk.pos_tag(i))

i get this error:

TypeError: tokens: expected a list of strings, got a string

When i change the for loop to use the nested list (all_separated_words_list) I get this error:

`Output exceeds the size limit. Open the full output data in a text editor

AttributeError Traceback (most recent call last) /var/folders/6g/n1v5s0vj77xc2htytg4spx_r0000gn/T/ipykernel_17689/361983526.py in 14 all_separated_words_list.append(text_preprocessing(found_sentences)) 15 for i in found_sentences: 16 pos_list.append(nltk.pos_tag(all_separated_words_list)) 17 # tokens = nltk.word_tokenize(i) 18 # tags = nltk.pos_tag(tokens)

~/opt/anaconda3/lib/python3.9/site-packages/nltk/tag/init.py in pos_tag(tokens, tagset, lang) 164 """ 165 tagger = _get_tagger(lang) 166 return _pos_tag(tokens, tagset, tagger, lang) 167 168

~/opt/anaconda3/lib/python3.9/site-packages/nltk/tag/init.py in _pos_tag(tokens, tagset, tagger, lang) 121 122 else: 123 tagged_tokens = tagger.tag(tokens) 124 if tagset: # Maps to the specified tagset. 125 if lang == "eng":

~/opt/anaconda3/lib/python3.9/site-packages/nltk/tag/perceptron.py in tag(self, tokens, return_conf, use_tagdict) 178 output = [] ... 277 if word.isdigit() and len(word) == 4: 278 return "!YEAR" 279 if word and word[0].isdigit():

AttributeError: 'list' object has no attribute 'isdigit'`

So I'm not too sure how to proceed. Would appreciate any help

Original Q&A

There are 1 answers

**larapsodia** · Answer 1 · 2023-01-24T21:51:37+00:00

From the error message, it's telling you that it expected a string, but instead it got a list.

for i in found_sentences:
    pos_list.append(nltk.pos_tag(i))

I suspect what's happening is that at this point you think you're giving it a single sentence, and then trying to iterate over the words in it, but found_sentences is actually list of sentences. So when it iterates over them it's finding a list (the tokenized sentence) instead of a string (the individual word).

Go back over your code again, looking at the output of each line and you'll be able to see where it is going wrong.

TechQA.

Apply POS tag to nested list

`Output exceeds the size limit. Open the full output data in a text editor

There are 1 answers

Related Questions in PYTHON

Related Questions in FOR-LOOP

Related Questions in NLP

Related Questions in POS-TAGGER

Popular Questions

Trending Questions