Parse OAI2/ XML format

182 views Asked by At

I have this format to parse: http://export.arxiv.org/oai2?verb=GetRecord&identifier=oai:arXiv.org:0804.2273&metadataPrefix=arXiv

So I want to get the author's names:

import urllib.request
import xml.etree.ElementTree as ET

response = urllib.request.urlopen(url_to_fetch)
xml = response.read()

root = ET.fromstring(xml)


for record in root.find(OAI+'ListRecords').findall(OAI+"record"):
    meta = record.find(OAI+'metadata')
    info = meta.find(ARXIV+"arXiv")
    authors = info.findall(ARXIV + 'authors/' + ARXIV + 'author')
    for author in authors:
        forenames = author.find(ARXIV+'forenames').text
        keyname = author.find(ARXIV+'keyname').text
        print(forenames)
        print(keyname)

But I get this error:

forenames = author.find(ARXIV+'forenames').text
AttributeError: 'NoneType' object has no attribute 'text'

The problem is forenames, if I remove it everything works fine. How can I fix it?

0

There are 0 answers