Regular expression for wraping all tr contains th tags in thead

463 views Asked by At

I have a problem with regex, I need to wrap all the tr which contains th and put it in a thead. I have a variable $html which contains a html table like this:

$html ="
<table>
<tr>
  <th>header1</th> 
  <th>header2</th>
  <th>header3</th>
</tr>
<tr>
  <th>header21</th> 
  <th>header22</th>
  <th>header23</th>
</tr>

<tr>
  <td>body1</td> 
  <td>body2</td>
  <td>body3</td>
</tr>
<tr>
  <td>body21</td> 
  <td>body22</td>
  <td>body23</td>
</tr>
</table>";

The regex i wrote is this

$html = preg_replace_callback(
'#(<tr.*?<th>.*?<th>.*?<\/tr>)#s', 
 function($match) {
        return '<thead>' . $match[0] . '</thead>';
    },
 $html);

But the result I get is different for what I want. Now, I get tr into a different thead.

2

There are 2 answers

1
simbabque On BEST ANSWER

It's not a good idea to try to parse HTML with regular expressions.

That said, you need to get rid of one question mark, which gives you unlimited but as few as possible. For the space between the first and last <th> you want it to be as many as possible. This will to the trick:

              #this is supposed to be as greedy as possible
              #
~(<tr.*?<th>.*<th>.*?</tr>)~s

See https://regex101.com/r/fR1xB5/1

0
Kerwin On

If have two table in page ,better try below one.

   (<tr>\s*(<th>((?!<tr>).)*</th>)+\s*</tr>)

example:https://regex101.com/r/fR1xB5/2