I am building a program (Visual Studio 2010, .NET 4, C# based console application) to gather specific information from a publicly available government report that is only available as an xml download. Its structure is similar to the following:
<Collections>
<Collection>
<Info id="123456" address="Some Place" name="Some Name"/>
<Items>
<Item1/>
<Item2/>
<Item3 I3="Y"/>
<Item3A I3A1="N" I3A2="N" I3A3 = "Y"/>
<Item3B I3B1="N" I3B2="N"/>
<Item4/>
</Items>
</Collection>
<Collection>...</Collection>
<Collection>...</Collection>
</Collections>
The full file has hundreds of blocks and ranges from 50-100mb. I have never worked with XML formatted even remotely closely to this (it looks awful, right?) and have had a lot of trouble trying to find any examples of queries that are useful.
I need to return the id from the element for all nodes that have a "Y" in the elements Item3 through Item3B. It's driving me a little crazy, because it would be easy if they had matching element names and matching attributes, but they are all unique. You can't include a wildcard in an XPath query like /Item3*[Q3*="Y"].
Does anybody have any ideas on how to tackle this? Thanks!
The right answer depends on the exact "rules" for selecting nodes. It's not clear whether you are always looking for
Item3
throughItem3B
or if they are just examples of the rule. I also assume that by "nodes have a 'Y' in the elements" you mean they have an attribute value wich equals "Y".If you are interested in exactly three element nodes with exactly the names "Item3", "Item3A" and "Item3B", and if the "Y" value can be on any attribute, use
Else, if the rule only says that element names must start with "Item3", use
If there are namespaces in your input XML document, it would be safer to use the
local-name()
function instead ofname()
.It seems you are also trying to match attributes that start with a certain string:
As you can see,
is not really true - there are "wildcards" (you don't usually call them wildcards), but you need the right syntax.