Applying regex in Openrefine with Python

191 views Asked by At

I am trying to use the value.findall() function in OpenRefine 3.4 by finding all the rows in a column that contain specific strings i.e., "WASHER", "FLAT", "10MM" and "SS"` in any random order given and return that into a new column. Here is a snippet of my codes.

import re
regex=r"(\WASHER)(\"FLAT")(\"10MM")(\"SS")"
return re.findall(regex, value)

Here is what am what screen looks like.

screenshot of my what my the data in the column looks like

1

There are 1 answers

0
Wiktor Stribiżew On

You need to put the following code into the box:

import re
regex=r'^(?=.*\bWASHER\b)(?=.*\bFLAT\b)(?=.*\b10MM\b)(?=.*\bSS\b).*'
return re.findall(regex, value)

This will return a whole string that contains WASHER, FLAT, 10MM and SS as whole words anywhere in the string.

See the regex demo.

If they occur in immediate succession, you can use

regex=r'.*?\bWASHER\s+FLAT\s+10MM\s+SS\b.*'

See this regex demo.