I'm experimenting with regex and I'm trying to filter out bunch of email addresses that are embedded in some text source. The filter process will be on two specific conditions:
Every email starts with
abc
Regular email patter which includes an
@
followed by a.
and ending specifically incom
Source:
sajgvdaskdsdsds
[email protected]
sdksdhkshdsdk[email protected]
wdgjkasdsdadPattern1 = "abc[\w\W][@][\w]\.com
code:
public class Test {
/**
* @param args the command line arguments
*/
public static void main(String[] args)
{
boolean found = false;
String source = "[email protected]@gmail.comwdgjkasdsdad";
String pattern1 = "abc[\\w\\W]*[@][\\w]*\\.com";
Pattern p1 = Pattern.compile(pattern1);
Matcher m1 = p1.matcher(source);
System.out.println("Source:\t" + source);
System.out.println("Exprsn:\t" + m1.pattern());
while (m1.find())
{
found = true;
System.out.println("Pos: " + m1.start() + "\tFound: " + m1.group());
}
System.out.println();
if(!found)
{
System.out.println("Nothing found!");
}
}
}
I'm expecting o/p as:
Pos: 15 Found: [email protected]
Pos: 48 Found: [email protected]
But getting:
Pos: 15 Found: [email protected]@gmail.com
If I use this Pattern2: abc[\\w]*[@][\\w]*\\.com
then I'm getting the expected o/p. However, the thing is email address can contain non-word characters after abc
and before @
. (For example: [email protected]
).
Hence Pattern2 doesn't work with non-word characters. So, I went with [\\w\\W]*
instead of [\\w]*
.
I also tried Pattern3: abc[\\w\\W][@][\\w]\\.com[^.]
and still doesn't work.
Please help me, where am I doing wrong?
Regex operators are greedy by default, meaning that they will grab as much of the string as they can.
[\w\W]*
will grab all intervening@
characters except for the very last one.Either use the reluctant form of the operators (e.g.
*?
instead of*
), or just simplify the expression:[^@]
will take as many characters that aren't@
as it can find. Similarly[^.]
will match everything until the first dot.Alternatively, you can use reluctant operators: