simplexml_load_string not parsing my XML string. Charset issue?

3.1k views Asked by At

I'm using the following PHP code to read XML data from NOAA's tide reporting station API:

$rawxml = file_get_contents(
    "http://opendap.co-ops.nos.noaa.gov/axis/webservices/activestations/"
    ."response.jsp?v=2&format=xml&Submit=Submit"
);
$rawxml = utf8_encode($rawxml);
$ob = simplexml_load_string($rawxml);
var_dump($ob);

Unfortunately, I end up with it displaying this:

object(SimpleXMLElement)#246 (0) { }

It looks to me like the XML is perfectly well-formed - why won't this parse? From looking at another question (Simplexml_load_string() fail to parse error) I got the idea that the header might be the problem - the http call does indeed return a charset value of "ISO-8859-1". But adding in the utf8_encode() call doesn't seem to do the trick.

What's especially confusing is that simplexml_load_string() doesn't actually fail - it returns a cheerful XML array, just with nothing in it!

1

There are 1 answers

1
IMSoP On BEST ANSWER

You've been fooled (and had me fooled) by the oldest trick in the SimpleXML book: SimpleXML doesn't parse the whole document into a PHP object, it presents a PHP API to an internal structure. Functions like var_dump can't see this structure, so don't always give a useful idea of what's in the object.

The reason it looks "empty" is that it is listing the children of the root element which are in the default namespace - but there aren't any, they're all in the "soapenv:" namespace.

To access namespaced elements, you need to use the children() method, passing in the full namespace name (recommended) or its local prefix (simpler, but could be broken by changes in the way the file is generated the other end). To switch back to the "default namespace", use ->children(null).

So you could get the ID attribute of the first stationV2 element like this (live demo):

// Define constant for the namespace names, rather than relying on the prefix the remote service uses remaining stable
define('NS_SOAP', 'http://schemas.xmlsoap.org/soap/envelope/');

// Download the XML
$rawxml = file_get_contents("http://opendap.co-ops.nos.noaa.gov/axis/webservices/activestations/response.jsp?v=2&format=xml&Submit=Submit");
// Parse it
$ob = simplexml_load_string($rawxml);

// Use it!
echo $ob->children(NS_SOAP)->Body->children(null)->ActiveStationsV2->stationsV2->stationV2[0]['ID'];

I've written some debugging functions to use with SimpleXML which should be much less misleading than var_dump etc. Here's a live demo with your code and simplexml_dump.