Regular expression to replace all pipe symbols between double quotes

136 views Asked by At

I have the following string received as part of a file record

1234567890|ABCDE|""|"01|02|03|"|453625|New Account|05736372828|NA|||AT|899

The record is using pipe symbol | as a delimiter, however if the | is appearing within the data that is inside double quotes " it should be not be split and considered as single text e.g. "01|02|03"

I am using regex to try converting the "01|02|03|" data into "01,02,03," before splitting the string using | delimiter, however the regex is not working as expected.

Below is the code snippet written for the requirement using reference from another SO question Regular expression, replace all commas between double quotes

public static void main(String[] args) {
    String orig = "1234567890|ABCDE|\"\"|\"01|02|03|\"|453625|New Account|05736372828|NA|||AT|899";
    String regex = "(?<=\")([^\"]+?)\\|([^\"]+?)(?=\")";  
    String old = orig;
    String result = orig.replaceAll(orig, "$1,$2");  
    while (!result.equalsIgnoreCase(old)){  
        old = result;  
        result = result.replaceAll(regex, "$1,$2");  
    }
    System.out.println(result);
}

The output from the above code is 1234567890|ABCDE|""|"01,02,03|"|453625|New Account|05736372828|NA|||AT|899 which is not as expected. The | after 03 in "01|02|03|" is not getting replaced with ,.

Appreciate if someone can help correct the regex or share an altogether new regex that would help split the string by retaining the | within the ".

2

There are 2 answers

0
blhsing On

You can use a positive lookahead pattern to match only pipes that are followed by an odd number of double quotes, providing that the double quotes are properly paired:

result = orig.replaceAll("\\|(?=[^\"]*\"(?:(?:[^\"]*\"){2})*[^\"]*$)", ",")

Demo: https://ideone.com/s0TOq0

0
Ahmed Mera On

You can use the regular expression (\"([^\"]*)\") to capture content within double quotes, excluding the quotes themselves. Then iterate the string, replace any pipes captured with commas.

    public static void main(String[] args) {

        String orig = "1234567890|ABCDE|\"01|02|03|\"|453625|New Account|05736372828|NA|||AT|899";

        String regex = "\"([^\"]*)\"";

        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(orig);

        StringBuffer result = new StringBuffer();

        while (matcher.find()) {
            String replacement = matcher.group(1).replace('|', ',');
            matcher.appendReplacement(result, Matcher.quoteReplacement("\"" + replacement + "\""));
        }

        matcher.appendTail(result);

        System.out.println(result);
    }

Output 1234567890|ABCDE|"01,02,03,"|453625|New Account|05736372828|NA|||AT|899