Parse escaped comma in a csv file using PHP

113 views Asked by At

I'm trying to parse a csv file. But when trying to parse the following line, I'm facing an issue with an escaped comma.

<?php
$str = "19018216307,Public,\,k]'=system1-system2,20230914143505.5,1-050000,No";
$data = str_getcsv($str);
?>

Output:

<?php
Array
(
    [0] => 19018216307
    [1] => Public
    [2] => \
    [3] => k]'=system1-system2
    [4] => 20230914143505.5
    [5] => 1-050000
    [6] => No
)
?>

Let's consider the column value \,k]'=system1-system2. It is expected to be parsed as ,k]'=system1-system2. But when processing the CSV file, PHP treats this as 2 columns and the result is like \ and k]'=@system1-system2.

Expected output:

<?php
Array
(
    [0] => 19018216307
    [1] => Public
    [2] => ,k]'=system1-system2
    [3] => 20230914143505.5
    [4] => 1-050000
    [5] => No
);
?>

NOTE: The CSV file is a raw data generated by an external website. So I can't do anything with the csv file content. (For eg: putting column values in double quotes)

Thanks in advance!

1

There are 1 answers

2
Casimir et Hippolyte On

A workaround for your strange "csv format":

$str = "19018216307,Public,\,k]'=system1-system2,20230914143505.5,1-050000,No";

$pattern = <<<'REGEX'
~(?nxx)
    (?# modifiers:
        - inline n: parenthesis act as a non-capturing group
        - inline xx: white-spaces are ignored even in character classes
        - global A: all the matches have to be contiguous
    )

    # pattern
    ( (?!\A) , \K | \A ) # not at the start with a commas or at the start without
    [^ , \\ ]* ( \\ . [^ , \\ ]* )* # field content (all that isn't a comma nor
                                    # a backslash, except escaped characters) 
                                           
    # final check
    ( \z (*:END) )? # define a marker if the end of the string is reached
~A
REGEX;

if (preg_match_all($pattern, $str, $m) && isset($m['MARK'])) {
    $result = array_map(fn($s) => strtr($s, ['\\\\' => '\\', '\\' => '']), $m[0]);
    print_r($result);
}

demo

If the format allows newlines in a field, add the modifier s. (i.e. ~As at the end of the pattern or (?nxxs) at the start)