I implemented allennlp's OIE, which extracts subject, predicate, object information (in the form of ARG0, V, ARG1 etc) embedded in nested strings. However, I need to make sure that each output is linked to the given ID
of the original sentence.
I produced the following pandas dataframe, where OIE output
contains the raw output of the allennlp algorithm.
Current output:
sentence | ID | OIE output |
---|---|---|
'The girl went to the cinema' | 'abcd' | {'verbs':[{'verb': 'went', 'description':'[ARG0: The girl] [V: went] [ARG1:to the cinema]'}]} |
'He is right and he is an engineer' | 'efgh' | {'verbs':[{'verb': 'is', 'description':'[ARG0: He] [V: is] [ARG1:right]'}, {'verb': 'is', 'description':'[ARG0: He] [V: is] [ARG1:an engineer]'}]} |
My code to get the above table:
oie_l = []
for sent in sentences:
oie_pred = predictor_oie.predict(sentence=sent) #allennlp oie predictor
for d in oie_pred['verbs']: #get to the nested info
d.pop('tags') #remove unnecessary info
oie_l.append(oie_pred)
df['OIE out'] = oie_l #add new column to df
Desired output:
sentence | ID | OIE Triples |
---|---|---|
'The girl went to the cinema' | 'abcd' | '[ARG0: The girl] [V: went] [ARG1:to the cinema]' |
'He is right and he is an engineer' | 'efgh' | '[ARG0: He] [V: is] [ARG1:right]' |
'He is right and he is an engineer' | 'efgh' | '[ARG0: He] [V: is] [ARG1:an engineer]' |
Approach idea:
To get to the desired output of 'OIE Triples' , I was considering transforming the initial 'OIE output' into a string and then using regular expression to extract the ARGs. However, I am not sure if this is the best solution, as the 'ARGs' can vary. Another approach, would be to iterate to the nested values of description:
, replace what is currently in the OIE output in the form of a list and then implement df.explode()
method to expand it, so that the right sentence and id columns are linked to the triple after 'exploding'.
Any advice is appreciated.
Your second idea should do the trick:
In case
"OIE output"
values are not trulydict
s butstr
ings, we convert them todict
s viaast.literal_eval
. (so if they aredict
s, you can skip the first 2 lines).Then we get a list for each
val
ue of the series that is composed of"description"
s of the outermost dict key'ed by"verbs"
.Finally
explode
thisdescription
lists anddrop
the"OIE output"
column as it is no longer needed.to get