Extracting email addresses from messy text in OpenRefine

Question

Extracting email addresses from messy text in OpenRefine

704 views Asked by Abi Hassen At 02 February 2018 at 22:33

I am trying to extract just the emails from text column in openrefine. some cells have just the email, but others have the name and email in john doe <[email protected]> format. I have been using the following GREL/regex but it does not return the entire email address. For the above exaple I'm getting ["[email protected]"]

value.match(
/.*([a-zA-Z0-9_\-\+]+@[\._a-zA-Z0-9-]+).*/
)

Any help is much appreciated.

Original Q&A

There are 2 answers

Ettore Rizza On 02 February 2018 at 23:24

If some cells contain just the email, it's probably better to use the @wiktor-stribiżew's partial match. In the development version of Open Refine, there is now a value.find() function that can do this, but it will only be officially implemented in the next version (2.9). In the meantime, you can reproduce it using Python/Jython instead of GREL:

import re
return re.findall(r"[^<\s]+@[^\s>]+", value)[0]

Result :

**Wiktor Stribiżew** · Accepted Answer · 2018-02-02T22:50:12+00:00

The n is captured because you are using .* before the capturing group, and since it can match any 0+ chars other than line break chars greedily the only char that can land in Group 1 during backtracking is the char right before @.

If you can get partial matches git rid of the .* and use

/[^<\s]+@[^\s>]+/

See the regex demo

Details

[^<\s]+ - 1 or more chars other than < and whitespace
@ - a @ char
[^\s>]+ - 1 or more chars other than whitespace and >.

Python/Jython implementation:

import re
res = ''
m = re.search(r'[^<\s]+@[^\s>]+', value)
if m:
    res = m.group(0)
return res

There are other ways to match these strings. In case you need a full string match .*<([^<]+@[^>]+)>.* where .* will not gobble the name since it will stop before an obligatory <.

TechQA.

Extracting email addresses from messy text in OpenRefine

There are 2 answers

Related Questions in REGEX

Related Questions in OPENREFINE

Related Questions in GOOGLE-REFINE

Related Questions in GREL

Popular Questions

Trending Questions