How to parse to string to multidimensional array (regex?)

405 views Asked by At

I need to pass the data to an array by blocks, how can I make this? Do I need to use regex? My script gives me errors because I can not separate it as I wish. Does anyone have any ideas?

Data:

~0 
11111111
~1 
222222222
~2 
3333333333

        ~end 
~0 
aaaaaaaaaaa
~1 
bbbbbbbbbb
~2 
cccccccccc
~3 
ddddddddddd 

        ~end 



~0 
yyyyyyyyyyy
xxxxxxxx
ffffffffff
~1 
rrrrrrrrrrrr
        ~end 

I need it like this:

Array ( 
  [0] => Array
                (
                    [0] => 11111111

                    [1] => 222222222 

                    [2] => 3333333333 


                )

        ),

  [1] => Array
                (
                    [0] => aaaaaaaaaaa

                    [1] => bbbbbbbbbb 

                    [2] => cccccccccc 

                    [3] => ddddddddddd 
                )

        ),

  [2] => Array
                  (
                      [0] => yyyyyyyyyyy
xxxxxxxx
ffffffffff

                      [1] => rrrrrrrrrrrr 

                  )

          ),



)

My code (Fail):

$texto = "~0 
11111111
~1 
222222222
~2 
3333333333

        ~end 
~0 
aaaaaaaaaaa
~1 
bbbbbbbbbb
~2 
cccccccccc
~3 
ddddddddddd 

        ~end 



~0 
yyyyyyyyyyy
xxxxxxxx
ffffffffff
~1 
rrrrrrrrrrrr
        ~end";

preg_match_all("/(?ms)^~0.*?~end/", $texto, $coincidencias);

foreach ( $coincidencias[0] as $bloque ){
    preg_match_all("/\~.*\n/", $bloque, $sub_bloques);
    $hola[] = $sub_bloques;
}
2

There are 2 answers

5
Don't Panic On

Here is one non-regex way: split the string into lines and iterate over them. Check for the conditions you've specified and add each line to a sub-array if it meets the conditions. Then when you get to an ~end line, append the sub-array to the main array.

$sub_bloques = [];
$hola = [];

foreach(array_map('trim', explode("\n", $texto)) as $line) {
    if ($line && substr($line, 0, 1) !== '~') {
        $sub_bloques[] = $line;
    }
    if ($line == '~end') {
        $hola[] = $sub_bloques;
        $sub_bloques = [];
    }
}

For a regex solution, start by exploding on ~end to break the main text into sections, then preg_match_all on the sections to find lines that meet your conditions.

foreach (explode('~end', $texto, -1) as $section) {
    preg_match_all('/\n *(?!~)(\w+)/', $section, $matches);
    if ($matches[1]) $result[] = $matches[1];
}

(?!~) is a a negative lookbehind to exclude lines that start with ~. Maybe there's some way to do the whole thing with one big cool regex, but I'm not that good at it.

1
mickmackusa On

Because you want to have your sub-blocks separated into blocks in your output array, there needs to be two-steps in the method. The reason is that your sub-blocks have differing capture group counts and regex will not permit this variability.

Code:

// This delivers the sub-blocks in their relative blocks as requested in the OP
foreach (preg_split('/\s+~end\s*/',$texto) as $bloque) {
    if(preg_match_all('/(?:\~\d+\s+)\K.+?(?:\s+\S+)*?(?=\s+\~|$)/',$bloque,$sub_bloques)){
        $hola[]=$sub_bloques[0];
    }
}
var_export($hola);

Output *reformatted/condensed to save space on this page (View Demo):

array(
    array('11111111','222222222','3333333333'),
    array('aaaaaaaaaaa','bbbbbbbbbb','cccccccccc','ddddddddddd'),
    array('yyyyyyyyyyy
xxxxxxxx
ffffffffff','rrrrrrrrrrrr')
)

Alternatively, if you want to have all sub-blocks listed in a 1-dim array (not divided by blocks) the output array can be built in one step:

if(preg_match_all("/(?:\~\d+\s*)\K.+?(?:\s+\S+)*?(?=\s+\~)/s", $texto, $coincidencias)){
    var_export($coincidencias[0]);
}

1-dim output:

array (
    0 => '11111111',
    1 => '222222222',
    2 => '3333333333',
    3 => 'aaaaaaaaaaa',
    4 => 'bbbbbbbbbb',
    5 => 'cccccccccc',
    6 => 'ddddddddddd',
    7 => 'yyyyyyyyyyy
xxxxxxxx
ffffffffff',
    8 => 'rrrrrrrrrrrr',
)