I want to read the tag values like <title>
,<title_id>
from xml file. The value of <title>
read successfully. Is it possible to read the <title>
,<title_id>
with same loop?
Please help me I'm new to XML.
<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.5/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.5/ http://www.mediawiki.org/xml/export-0.5.xsd" version="0.5" xml:lang="en">
<siteinfo>
<sitename>Wiki</sitename>
<case>first-letter</case>
<namespaces>
<namespace key="0" case="first-letter" />
</namespaces>
</siteinfo>
<page>
<title>Sex</title>
<title_id>31239628</title_id>
<revision>
<id>437708703</id>
<timestamp>2011-07-04T13:53:52Z</timestamp>
<text xml:space="preserve" bytes="6830">{{ Hello}}
</text>
</revision>
</page>
</mediawiki>
I'm using following code to read all the title from file. And its working fine.
import xml.etree.cElementTree as etree
tree = etree.parse('find_title.xml')
for value in tree.getiterator(tag='title'):
print value.text
If you are going to be working with XML a lot, I'd suggest you familiarise yourself with XPATH.
Here's a quick snippet using my XML library of preference,
lxml
.Yields:
Update - supposing multiple page elements
XPATH queries mostly return node sequences (hence the
first
function).You could use a single query that returned the values of both tags for all of the pages. You would then have to group them together, if a subelement was missing from a page you'd be out of step. You could write the query to ensure the subelements existed, but you might want to know that there was a partial record, etc, etc.
So my first answer to this would be to loop through the pages like so:
Yielding: