Using LINQ to query XDocument, how to get specific values?

1.1k views Asked by At

I'm trying to refactor the following - which works, but if I start to get more elements in the XML it'll get unmanageable :

HttpResponseMessage response = await httpClient.GetAsync("https://uri/products.xml");

string responseAsString = await response.Content.ReadAsStringAsync();

List<Product> productList = new List<Product>();

XDocument xdocument = XDocument.Parse(responseAsString);
var products = xdocument.Descendants().Where(p => p.Name.LocalName == "item");

foreach(var product in products)
{
    var thisProduct = new Product();
    foreach (XElement el in product.Nodes())
    {
        if(el.Name.LocalName == "id")
        {
            thisProduct.SKU = el.Value.Replace("-master", "");
        }
        if (el.Name.LocalName == "availability")
        {
            thisProduct.Availability = el.Value == "in stock";
        }
    }
    productList.Add(thisProduct);
}

Given the following XML URL

<rss xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
    xmlns="http://base.google.com/ns/1.0" version="0">
    <channel>
        <title>Product Feed</title>
        <link></link>
        <description>Products</description>
        <item>
            <availability>in stock</availability>
            <id>01234-master</id>
            ...
        </item>
        <item>
            <availability>in stock</availability>
            <id>abcde-master</id>
            ...
        </item>
    </channel>
</rss>

Ideally I'd like to remove the loops and if statements and have a LINQ query that returns only the fields I need (id, availability etc..) from the XML in a nice clean way and populate a simple class with this data.

Can anyone help?

2

There are 2 answers

0
xanatos On BEST ANSWER

Sometimes you have to be happy for the code you have written. Sometimes there is no "smarter" way to write it... You can only write it a little "better":

List<Product> productList = new List<Product>();

XDocument xdocument = XDocument.Parse(responseAsString);

XNamespace ns = "http://base.google.com/ns/1.0";

var products = from x in xdocument.Elements(ns + "rss")
               from y in x.Elements(ns + "channel")
               from z in y.Elements(ns + "item")
               select z;

foreach (var product in products)
{
    var prod = new Product();
    productList.Add(prod);

    foreach (XElement el in product.Elements())
    {
        if (el.Name == ns + "id")
        {
            prod.SKU = el.Value.Replace("-master", string.Empty);
        }
        else if (el.Name == ns + "availability")
        {
            prod.Availability = el.Value == "in stock";
        }
    }
}

Notes:

  • The Descendants() is morally wrong. There is a fixed position where the item will be, /rss/channel/item, and you know it perfectly well. It isn't //item. Because tomorrow there could be a rss/foo/item that today doesn't exist. You try to write your code so that it is forward compatible with additional informations that could be added to the xml.
  • I do hate xml namespaces... And there are xml with multiple nested namespaces. How much I hate those. But someone more intelligent than me decided that they exist. I accept it. I code using them. In LINQ-to-XML it is quite easy. There is a XNamespace that even has an overloaded + operator.

    Note that if you are a micro-optimizer (I try not to be, but I have to admit, but my hands are itching a little), you can pre-calculate the various ns + "xxx" that are used inside the for cycle, because it isn't clear from here, but they are all rebuilt every cycle. An how a XName is built inside... oh... that is a fascinating thing, trust me.

    private static readonly XNamespace googleNs = "http://base.google.com/ns/1.0";
    private static readonly XName idName = googleNs + "id";
    private static readonly XName availabilityName = googleNs + "availability";
    

    and then

    if (el.Name == idName)
    {
        prod.SKU = el.Value.Replace("-master", string.Empty);
    }
    else if (el.Name == availabilityName)
    {
        prod.Availability = el.Value == "in stock";
    }
    
1
jdweng On

Try following :

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using System.IO;

namespace ConsoleApplication1
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.xml";
        static void Main(string[] args)
        {

            new Item(FILENAME);

        }
    }
    public class Item
    {
        public static List<Item> items { get; set; }

        public string availability { get; set; }
        public string id { get; set; }

        public Item() { }
        public Item(string filename)
        {
            string xml = File.ReadAllText(filename);

            XDocument doc = XDocument.Parse(xml);
            XElement root = doc.Root;
            XNamespace ns = root.GetDefaultNamespace();

            Item.items = doc.Descendants(ns + "item").Select(x => new Item() {
                availability = (string)x.Element(ns + "availability"),
                id = (string)x.Element(ns + "id")
            }).ToList();
        }
    }
}