OPenRefine GREL removing words if present in another column

52 views Asked by At

I have a big list of medication names and want to remove the words that are present in column 'delete'. I've tried multiple different methods and non have worked.

enter image description here

1

There are 1 answers

0
b2m On BEST ANSWER

There are several approaches to solve this in OpenRefine. For all of them I am assuming you are in a transform dialog on the column where you want to remove the words. I am also assuming that you only want to remove one word, as shown in your example data.

Option 1: replace

value.replace(row.cells["delete"].value, "")

This approach does not work, if the words in delete may be part of a word. For example "TAB" would also be replaced in "AMATABBON". You might use regular expressions for that. But afaik GREL does not have a way to dynamically create regular expressions (yet).

Option 2: Jython

import re
regex = re.compile(r'\b'+re.escape(row.cells["delete"].value)+r'\b')
return regex.sub("", value)

Option 3: filter

filter(value.split(" "), v, v != row.cells["delete"].value).join(" ")

This expression will split the words on the separator whitespace " ", remove words that are in the corresponding row in column delete and then join the remaining words back together on whitespace " ". You may also expand this to consider multiple words in columns delete:

with(
  row.cells["delete"].value.split(" "),
  forbiddenWords,
  filter(value.split(" "), v, not(forbiddenWords.inArray(v))).join(" ")
)