How to parse/extract url from an xml file?

952 views Asked by At

I have an XML file that contains the following type of data

<definition name="/products/phone" path="/main/something.jsp" > </definition>

There are dozens of nodes in the xml file.

What I want to do is extract the url under the 'name' parameter so my end result will be:

http://www.mysite.com/products/phone.jsp

Can I do this with a so called XML parser? I have no idea where to begin. Can someone steer me to a direction. What tools do I need to achieve something like that?

I am particularly interested in doing this with PHP.

2

There are 2 answers

2
AFKAP On

It should be easy to append a path to an existing URL and expected resource type given the above basic XML.

If you are comfortable with C#, and you know there is one and only one "definition" element, here is a self contained little program that does what you require (and assumes you are loading the XML from a string):

using System;
using System.Xml;

public class parseXml
{
    private const string myDomain = "http://www.mysite.com/";
    private const string myExtension = ".jsp";

    public static void Main()
    {
        string xmlString = "<definition name='/products/phone' path='/main/something.jsp'> </definition>";

        XmlDocument doc = new XmlDocument();

        doc.LoadXml(xmlString);

        string fqdn =   myDomain +
                        doc.DocumentElement.SelectSingleNode("//definition").Attributes["name"].ToString() +
                        myExtension;

        Console.WriteLine("Original XML: {0}\nResultant FQDN: {1}", xmlString, fqdn);
    }
}

You are going to need to be careful with SelectSingleNode above; the XPath expression assumes there is only one "definition" node and that you are searching from the document root.

Fundamentally, it's worthwhile to read a primer on XML. Xml is not difficult, it's a self describing hierarchical data format - lots of nested text, angle brackets, and quotation marks :).

A good primer would probably be that at the W3 Schools: http://www.w3schools.com/xml/xml_whatis.asp

You may also want to read up on streaming (SAX/StreamReader) vs. loading (DOM/XmlDocument) Xml: What is the difference between SAX and DOM?

I can provide a Java example too, if you feel that would be helpful.

0
Wiktor Stribiżew On

Not sure if you solved your problem, so here is a PHP solution:

$xml = <<<DATA
<?xml version="1.0"?>
<root>
<definition name="/products/phone" path="/main/something.jsp"> </definition>
<definition name="/products/cell" path="/main/something.jsp"> </definition>
<definition name="/products/mobile" path="/main/something.jsp"> </definition>
</root>
DATA;

$arr = array();
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($xml);

$xpath = new DOMXPath($dom);
$defs = $xpath->query('//definition');

foreach($defs as $def) { 
   $attr = $def->getAttribute('name');
   if ($attr != "") {
      array_push($arr, $attr);
   }
}
print_r($arr);

See IDEONE demo

Result:

Array
(
    [0] => /products/phone
    [1] => /products/cell
    [2] => /products/mobile
)