I have a string that I want to explode by commas but only if the comma is not nested inside some parentheses. This is a fairly common use case, and I have been reading through answered posts in this forum, but not really found what I am looking for.

So, in detail: the point is, i have a string (= SQL SELECT ... FROM statement), and I want to extract the elements from the list separated by comma encoded in this string (= the name of the columns that one wants to select from. However, these elements can contain brackets, and effectively be function calls. For example, in SQL one could do

SELECT TO_CHAR(min(shippings.shippingdate), 'YYYY-MM-DD') as shippingdate, nameoftheguy FROM shippings WHERE ...

Obviously, I would like to have an array now containing as first element

TO_CHAR(min(shippings.shippingdate), 'YYYY-MM-DD') as shippingdate

and as second element

nameoftheguy

The approaches I have followed so far are PHP and RegEx: Split a string by commas that are not inside brackets (and also nested brackets), PHP: Split a string by comma(,) but ignoring anything inside square brackets?, Explode string except where surrounded by parentheses?, and PHP: split string on comma, but NOT when between braces or quotes? (focussing on the regex expressions therein, since I would like to do it with a single regex line), but in my little test area, those do not give the proper result. In fact, all of them split nothing or too much:

$Input: SELECT first, second, to_char(my,big,house) as bigly, export(mastermind and others) as aloah FROM
$Output: Array ( [0] => first [1] => second [2] => to_char [3] => (my,big,house) [4] => as [5] => bigly [6] => export [7] => (mastermind and others) [8] => as [9] => aloah )

The code of my test area is

<?php
function test($sql){
    $foo = preg_match("/SELECT(.*?)FROM/", $sql, $match);
    $bar = preg_match_all("/(?:[^(|]|\([^)]*\))+/", $match[1], $list);
    //$bar = preg_match_all("/\((?:[^()]|(?R))+\)|'[^']*'|[^(),\s]+/", $match[1], $list);
    //$bar = preg_match_all("/[,]+(?![^\[]*\])/", $match[1], $list);
    //$bar = preg_match_all("/(?:[^(|]|\([^)]*\))+/", $match[1], $list);
    //$bar = preg_match_all("/[^(,\s]+|\([^)]+\)/", $match[1], $list);
    //$bar = preg_match_all("/([(].*?[)])|(\w)+/", $match[1], $list);
    print "<br/>";
    return $list[0];
}

print_r(test("SELECT first, second, to_char(my,big,house) as bigly, export(mastermind and others) as aloah FROM"));
?>

As you can imagine, I am not an regex expert, but I would like to do this splitting in a single line, if it is possible.

1 Answers

0
conni On Best Solutions

Following the conversation here, I did write a parser to solve this problem. It is quite ugly, but it does the job (at least within some limitations). For completeness (if anybody else might run into the same question), I post it here:

function full($sqlu){
    $sqlu = strtoupper($sqlu);
    if(strpos($sqlu, "SELECT ")===false || strpos($sqlu, " FROM ")===false) return NULL;
    $def      = substr($sqlu, strpos($sqlu, "SELECT ")+7, strrpos($sqlu, " FROM ")-7);
    $raw      = explode(",", $def);
    $elements = array();
    $rem      = array();
    foreach($raw as $elm){
        array_push($rem, $elm);
        $txt = implode(",", $rem);
        if(substr_count($txt, "(") - substr_count($txt, ")") == 0){
            array_push($elements, $txt);
            $rem = array();
        }
    }
    return $elements;
}

When feeding it with the following string

SELECT first, second, to_char(my,(big, and, fancy),house) as bigly, (SELECT myVar,foo from z) as super, export(mastermind and others) as aloah FROM table

it returns

Array ( [0] => first [1] => second [2] => to_char(my,(big, and, fancy),house) as bigly [3] => (SELECT myVar,foo from z) as super [4] => export(mastermind and others) as aloah )