I'm experimenting with regex and I'm trying to filter out bunch of email addresses that are embedded in some text source. The filter process will be on two specific conditions:
Every email starts with
abcRegular email patter which includes an
@followed by a.and ending specifically incom
Source:
sajgvdaskdsdsds
[email protected]sdksdhkshdsdk[email protected]wdgjkasdsdadPattern1 = "abc[\w\W][@][\w]\.com
code:
public class Test {
/**
* @param args the command line arguments
*/
public static void main(String[] args)
{
boolean found = false;
String source = "[email protected]@gmail.comwdgjkasdsdad";
String pattern1 = "abc[\\w\\W]*[@][\\w]*\\.com";
Pattern p1 = Pattern.compile(pattern1);
Matcher m1 = p1.matcher(source);
System.out.println("Source:\t" + source);
System.out.println("Exprsn:\t" + m1.pattern());
while (m1.find())
{
found = true;
System.out.println("Pos: " + m1.start() + "\tFound: " + m1.group());
}
System.out.println();
if(!found)
{
System.out.println("Nothing found!");
}
}
}
I'm expecting o/p as:
Pos: 15 Found: [email protected]
Pos: 48 Found: [email protected]
But getting:
Pos: 15 Found: [email protected]@gmail.com
If I use this Pattern2: abc[\\w]*[@][\\w]*\\.com then I'm getting the expected o/p. However, the thing is email address can contain non-word characters after abc and before @. (For example: [email protected]).
Hence Pattern2 doesn't work with non-word characters. So, I went with [\\w\\W]* instead of [\\w]*.
I also tried Pattern3: abc[\\w\\W][@][\\w]\\.com[^.] and still doesn't work.
Please help me, where am I doing wrong?
Regex operators are greedy by default, meaning that they will grab as much of the string as they can.
[\w\W]*will grab all intervening@characters except for the very last one.Either use the reluctant form of the operators (e.g.
*?instead of*), or just simplify the expression:[^@]will take as many characters that aren't@as it can find. Similarly[^.]will match everything until the first dot.Alternatively, you can use reluctant operators: