retrieving data from a html table using php

2k views Asked by At

I am aware that this question has been asked many times but I have looked into many examples and I have still been unable to get the data I need out of this html table.

I have a php file that generates a html table like this:

    <table width="97%">
    <tr><td align="center">
    <!-- table for columns -->
    <table border="0" cellpadding="15">
    <tr>
        <td valign="top">

        <table border="0" width="800">
        <caption style="font-size: 32px; font-weight: bold;">
        </caption>

        <!-- force column widths exactly (for some reason it didn't want to
        play along with normal width settings) -->
        <tr>
        <td><img src="/spacer.gif" width="160" height="1" border="0" alt="" /></td>
        <td><img src="/spacer.gif" width="170" height="1" border="0" alt="" /></td>
        </tr>
            <tr>
                <td style="">
                DATA1
                </td>

                <td width="200" style="font-size: 80px; font-weight:bold;">
                0            </td>
            </tr>

            <tr>
                <td style="">
                DATA2
                </td>

                <td width="200" style="font-size: 80px; font-weight:bold;">
                0            </td>
            </tr>
            <tr>
                <td style="">
                DATA3
                </td>

                <td width="200" style="font-size: 80px; font-weight:bold;">
        0            </td>
            </tr>
            <tr>
                <td style="">
                DATA4
                </td>

                <td width="200" style="font-size: 80px; font-weight:bold;">
                5            </td>
            </tr>
            <tr>
                <td style="">
                DATA5
                </td>

                <td width="200" style="font-size: 80px; font-weight:bold;">
                0            </td>
            </tr>
            <tr>
                <td style="">
                DATA6
                </td>

                <td width="200" style="font-size: 80px; font-weight:bold;">
                0            </td>
            </tr>


        <!-- end of stats_with_style loop -->

        </table>

        </td>



    <!-- end of groups loop -->

    </tr>
    </table>

    <br /><br />


    </td></tr>
    </table>

And I want to get the html (number) of each DATA set (after the style on each ) using php.

Can anyone shed some light on how I can do this?

2

There are 2 answers

1
Alex Jegtnes On BEST ANSWER

I would normally suggest using a DOM parser like Ganon, but if this HTML's structure stays fairly simple (like this), just using PHP's native DOM and XPath selectors might just be a simpler, lower-overhead solution. Load your HTML into a string like this:

<?php
$html = <<<EOF
<table width="97%">
    <tr><td align="center">
    <!--SNIP-->
EOF;

$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$data = [];

// targets any <td> with a <style> element and only selects odd elements
// (XPath counting starts at 1)
foreach($xpath->query("//td[@style][position() mod 2 = 0]") as $node) {
    //replace superflous whitespace in the string
    $data[] = preg_replace('/\s+/', '', $node->nodeValue);
}

And you will now have a $data[] array consisting of only the numeric values (which you requested).

If you need the keys (DATA1 etc...) as well, it's a fairly straight-forward job to make it into an associative array by looping over the even elements, just add this code:

foreach($xpath->query("//td[@style][position() mod 2 = 1]") as $node) {
    $keys[] = preg_replace('/\s+/', '', $node->nodeValue);
}

$dataWithKeys = array_combine($keys, $data);

Hope that helps!

1
Nathan Wiles On

The file is being generate using PHP, but then you want to use PHP to get the data back? Maybe you should save that data elsewhere in the first place, in a format easier to read with PHP.