Comparing sub items of lists and making changes in Python

56 views Asked by At

I have two lists originating from a part of speech tagger which look as follows:

pos_tags = [('This', u'DT'), ('is', u'VBZ'), ('a', u'DT'), ('test', u'NN'), ('sentence', u'NN'), ('.', u'.'), ('My', u"''"), ('name', u'NN'), ('is', u'VBZ'), ('John', u'NNP'), ('Murphy', u'NNP'), ('and', u'CC'), ('I', u'PRP'), ('live', u'VBP'), ('happily', u'RB'), ('on', u'IN'), ('Planet', u'JJ'), ('Earth', u'JJ'), ('!', u'.')]


pos_names = [('John', 'NNP'), ('Murphy', 'NNP')]

I want to create a final list which updates pos_tags with the list items in pos_names. So basically I need to find John and Murphy in pos_tags and replace the POS tag with NNP.

3

There are 3 answers

1
xnx On BEST ANSWER

I agree a dictionary would be a more natural solution to this problem, but if you need your pos_tags in order a more explicit solution would be:

for word, pos in pos_names:
    for i, (tagged_word, tagged_pos) in enumerate(pos_tags):
        if word == tagged_word:
            pos_tags[i] = (word,pos)

(A dictionary would probaby be faster for a large number of words, so you might want to consider storing the word order in a list and doing your POS allocation using a dictionary).

0
Kevin On

You could create a dictionary from pos_names that behaves as a lookup table. Then you can use get to search the table for possible replacements, and leave the tag as-is if no replacement is found.

d = dict(pos_names)
pos_tags = [(word, d.get(word, tag)) for word, tag in pos_tags]
1
tdc On

Given

pos_tags = [('This', u'DT'), ('is', u'VBZ'), ('a', u'DT'), ('test', u'NN'), ('sentence', u'NN'), ('.', u'.'), ('My', u"''"), ('name', u'NN'), ('is', u'VBZ'), ('John', u'NNP'), ('Murphy', u'NNP'), ('and', u'CC'), ('I', u'PRP'), ('live', u'VBP'), ('happily', u'RB'), ('on', u'IN'), ('Planet', u'JJ'), ('Earth', u'JJ'), ('!', u'.')]

and

names = ['John', 'Murphy']

you can do:

[next((subl for subl in pos_tags if name in subl)) for name in names]

which will give you:

[('John', u'NNP'), ('Murphy', u'NNP')]