Regex for deconstructing SQL where statement

100 views Asked by At

Looking for a fancy regex to split partial SQL where statement. Legacy app is quite old and records below are manually entered, which allows for a lot of mistakes. Given below where code I need to split it by brackets. I also need to record logical operator so after processing each statement I could construct it back.

SQL where scenarios:

my $view2 = "test1=1 AND ( test2=2 ) AND (test3=4 OR test3=9 OR test3=3 OR test3=5)"
EXPECTED array:
1: "test1=1"
2: "test2=2"
3: "test3=4 OR test3=9 OR test3=3 OR test3=5"

my $view2 = "( test1=2 AND test2=1 ) OR ( test1=2 AND test2=6 ) OR ( test1=2 AND test2=8 ) OR ( test1=2 AND test2=10 )"
EXPECTED array:
1: "test1=2 AND test2=1"
2: "test1=2 AND test2=6"
3: "test1=2 AND test2=8"
4: "test1=2 AND test2=10"

my $view3 = "test1 = '1' AND test3 = '3'"
EXPECTED array:
1: "test1 = '1' AND test3 = '3'"

I've tried splitting each statement and logic separately, one array with sql code, another array with logic operators. I also tried one array with sql code and logical operators. If code is wrapped properly in brackets, all is working ($view2) but if its not then that part is omitted ($view1 and $view3).

This is what I have so far

my $view1 = "test1=1 AND ( test2=2 ) AND (test3=4 OR test3=9 OR test3=3 OR test3=5)";
my $view2 = "( test1=2 AND test2=1 ) OR ( test1=2 AND test2=6 ) OR ( test1=2 AND test2=8 ) OR ( test1=2 AND test2=10 )";
my $view3 = "test1 = '1' AND test3 = '3'";

# stmt and logic separated
my @arr = ( $view1 =~ /\([^)]*\)/g );
my @logic= ( $view1 =~ /\)[^)]*\(/g );

# stmt and logic in one array
my @arrOne = $view =~ /\([^)]*[^(]*\)|\w+/g;


my $counter = 1;
foreach my $record (@arr) {
    $record =~ s/\( | \)//g;
    print "\n" . $counter . ":";
    print $record;
    $counter += 1;
}

my $counter_logic = 1;
foreach my $record (@logic) {
    $record =~ s/\) | \(//g;
    print "\n";
    print $record;
}

How can I add part of sql code without brackets to regex pattern? Not sure if it's even doable? Any idea how to achieve it?

2

There are 2 answers

1
niebyl2002 On

So after multiple failures I decided to use ChatGPT and believe it or not code seems to be working.

First I needed to clean string from single quotes and whitespaces and then following pattern did the trick

my $pattern = qr/\b\w+\s*=\s*(?:'\w+'|\d+)\b|\(.*?\)/;
my @arr = $view =~ /$pattern/g;
0
Bork On

This is not a complete answer, but a demonstration of a strategy. You could try pre-parsing your input statements to extract parenthesised blocks, and replace them with placeholders. After that, it should be trivial to parse the other statements. And with the placeholders in the original string, it is easy to reassemble it.

Note that it will not work for nested parentheses.

use strict;
use warnings;
use Data::Dumper;

my @data;
while (<DATA>) {
    my %block;
    my $count = 0;
    chomp;
    $block{original} = $_;
    while (s/\(([^)]+)\)/block$count/) {
        $block{"block$count"} = $1;
        $count++;
        if ($count > 100) {  # safety precaution
            warn "Too many blocks!"; last;
        }
    }
    $block{main} = $_;
    push @data, \%block;
}
print Dumper \@data;

__DATA__
test1=1 AND ( test2=2 ) AND (test3=4 OR test3=9 OR test3=3 OR test3=5)
( test1=2 AND test2=1 ) OR ( test1=2 AND test2=6 ) OR ( test1=2 AND test2=8 ) OR ( test1=2 AND test2=10 )
test1 = '1' AND test3 = '3'

Output:

$VAR1 = [
          {
            'main' => 'test1=1 AND block0 AND block1',
            'block0' => ' test2=2 ',
            'original' => 'test1=1 AND ( test2=2 ) AND (test3=4 OR test3=9 OR test3=3 OR test3=5)',
            'block1' => 'test3=4 OR test3=9 OR test3=3 OR test3=5'
          },
          {
            'block0' => ' test1=2 AND test2=1 ',
            'main' => 'block0 OR block1 OR block2 OR block3',
            'original' => '( test1=2 AND test2=1 ) OR ( test1=2 AND test2=6 ) OR ( test1=2 AND test2=8 ) OR ( test1=2 AND test2=10 )',
            'block1' => ' test1=2 AND test2=6 ',
            'block3' => ' test1=2 AND test2=10 ',
            'block2' => ' test1=2 AND test2=8 '
          },
          {
            'original' => 'test1 = \'1\' AND test3 = \'3\'',
            'main' => 'test1 = \'1\' AND test3 = \'3\''
          }
        ];