I am trying to extract just the emails from text column in openrefine. some cells have just the email, but others have the name and email in john doe <[email protected]> format. I have been using the following GREL/regex but it does not return the entire email address. For the above exaple I'm getting ["[email protected]"]
value.match(
/.*([a-zA-Z0-9_\-\+]+@[\._a-zA-Z0-9-]+).*/
)
Any help is much appreciated.

The
nis captured because you are using.*before the capturing group, and since it can match any 0+ chars other than line break chars greedily the only char that can land in Group 1 during backtracking is the char right before@.If you can get partial matches git rid of the
.*and useSee the regex demo
Details
[^<\s]+- 1 or more chars other than<and whitespace@- a@char[^\s>]+- 1 or more chars other than whitespace and>.Python/Jython implementation:
There are other ways to match these strings. In case you need a full string match
.*<([^<]+@[^>]+)>.*where.*will not gobble the name since it will stop before an obligatory<.