Assuming I have a data frame like
term cnt
apple 10
apples 5
a apple on 3
blue pears 3
pears 1
How could I filter all partial found strings within this column, e.g. getting as a result
term cnt
apple 10
pears 1
without indicating to which terms I want to filter (apple|pears), but through a self-referencing manner (i.e. it does check each term against the whole column and removes terms that are a partial match). The number of tokens is not limited, nor the consistency of strings (i.e. "mapples" would get matched by "apple"). This would result in an inverted generalized dplyr-based version of
d[grep("^apple$|^pears$", d$term), ]
Additionally, it would be interesting use this departialisation to get a cumulated sum, e.g.
term cnt
apple 18
pears 4
I couldn't get it to work with contains() or grep().
Thanks
Hopefully the complete answer. Not very idiomatic (as Pythonista's call) but someone can suggest improvement to this: