I write some java code to split string into array of string. First, I split that string using regex pattern "\\,\\,|\\,"
and then I split using pattern "\\,|\\,\\,"
. Why there are difference between output of the first and output of the second?
public class Test2 {
public static void main(String[] args){
String regex1 = "\\,\\,|\\,";
String regex2 = "\\,|\\,\\,";
String a = "20140608,FT141590Z0LL,0608103611018634TCKJ3301000000018667,3000054789,IDR1742630000001,80507,1000,6012,TCKJ3301,6.00E+12,ID0010015,WADORI PURWANTO,,3000054789";
String ss[] = a.split(regex1);
int index = 0;
for(String m : ss){
System.out.println((index++)+ ": "+m+"|");
}
}
}
Output when using regex1
:
0: 20140608|
1: FT141590Z0LL|
2: 0608103611018634TCKJ3301000000018667|
3: 3000054789|
4: IDR1742630000001|
5: 80507|
6: 1000|
7: 6012|
8: TCKJ3301|
9: 6.00E+12|
10: ID0010015|
11: WADORI PURWANTO|
12: 3000054789|
And when using regex2
:
0: 20140608|
1: FT141590Z0LL|
2: 0608103611018634TCKJ3301000000018667|
3: 3000054789|
4: IDR1742630000001|
5: 80507|
6: 1000|
7: 6012|
8: TCKJ3301|
9: 6.00E+12|
10: ID0010015|
11: WADORI PURWANTO|
12: |
13: 3000054789|
I need some explanation of how regex engine works when handling this situation.
How regex works: The state machine always reads from left to right.
,|,,
==,
, as it always will only be matched to the first alternation:(source: gyazo.com)
,,|,
==,,?
:(source: gyazo.com)
However, you should use
,,?
instead so there's no backtracking:(source: gyazo.com)