PHP Regex to extract columns of text delimited by multiple spaces

Question

PHP Regex to extract columns of text delimited by multiple spaces

563 views Asked by Shaheeb Roshan At 17 June 2015 at 18:08

I have a chunk of text extracted from a tabular layout that resembles this:

Waiting Period                             30 days of employment                 30 days of employment                    30 days of employment
 Benefit amount                                   Flat $150,000                        Flat $100,000                              Flat $60,000
 Maximum benefit                                    $150,000                              $100,000                                  $60,000
 Contributions                                  Noncontributory                       Noncontributory                           Noncontributory
   Participation requirement                         100.00%                               100.00%                                  100.00%
---
Benefit amount                                            Flat $40,000                                                  Flat $20,000
 Maximum benefit                                             $40,000                                                       $20,000
 Compulsory coverage                                            Yes                                                           Yes
 Contributions                                           Noncontributory                                               Noncontributory
Waiting Period                                        30 days of employment                                      30 days of employment

Phrases like Waiting Period, or Contributions are labels for the row. A variable number of columns then follow, separated by a variable number of whitespaces.

I am struggling to land on a regular expression that can target a particular row based on the label, and then extract the content of those variable number of columns. I think I have constructed the label identifier and the capture groups to identify the columns. But the expression seems to stop at the first match.

(?:\s*Waiting Period)(?:(?:\s{2,})(.*?)(?:\s{2,}|\n|$))

The above expression in preg_match_all:

preg_match_all("/(?:\s*Waiting Period)(?:(?:\s{2,})(.*?)(?:\s{2,}|\n))/", $input_lines, $output_array);

produces:

array(2) {
0   =>  array(2){
                    0   =>  Waiting Period                             30 days of employment
                    1   =>  
                            Waiting Period                                        30 days of employment 
                }
1   =>  array(2){
                    0   =>  30 days of employment
                    1   =>  30 days of employment   
                }
}

As you can see, the match correctly identifies the target rows based on the label, extracts the first column and quits. I don't know how to instruct the process to keep going until each matched row is processed until the end of line.

My question is: is my regex approach salvageable to accomplish my objective? Or, have I misunderstood preg_match_all and will only ever get one instance of the capture subgroup?

Original Q&A

There are 1 answers

**Avinash Raj** · Answer 1 · 2015-06-17T18:12:15+00:00

Avinash Raj On 17 June 2015 at 18:12

This is because (?:\s{2,}|\n) two or more spaces or newline character. So your regex stops until it finds another set of continuous space characters.

^\s*Waiting Period\s{2,}(.*)

DEMO

TechQA.

PHP Regex to extract columns of text delimited by multiple spaces

There are 1 answers

Related Questions in PHP

Related Questions in REGEX

Related Questions in PREG-MATCH-ALL

Popular Questions

Popular Tags

Trending Questions