Error when using XML API findall() in with python 2.6

Question

Error when using XML API findall() in with python 2.6

1.2k views Asked by Omar At 01 December 2014 at 11:17

I'm using this code below to retrieve information from the Alexa API, this code works well on Python 2.7 but I have to use Python 2.6 and it gives me an error:

findall() takes exactly 2 arguments (3 given)

I presume that this method has change in Python 2.7 but I don't know how to make it work in 2.6.

NS_PREFIXES = {
    "alexa": "http://alexa.amazonaws.com/doc/2005-10-05/",
    "awis": "http://awis.amazonaws.com/doc/2005-07-11",
}

tree = api.sites_linking_in(domain + ".eu", count=10, start=0)
alexa_sites_linkin_in = {}
for element in tree.findall('//awis:SitesLinkingIn/awis:Site',NS_PREFIXES):
    alexa_sites_linkin_in.update({
    element.find('awis:Title', NS_PREFIXES).text: element.find('awis:Url', "awis").text})

Original Q&A

There are 2 answers

roskakori On 11 January 2015 at 04:44

With Python 2.6 (and earlier) you need to register the namespace manually and resolve it into the Clark notation before for find() is able to recognize them.

First, register the namespace as describe at http://effbot.org/zone/element-namespaces.htm:

from xml import ElementTree
try:
    register_namespace = ElementTree.register_namespace
except AttributeError:
    def register_namespace(prefix, uri):
        ElementTree._namespace_map[uri] = prefix

for short_name, url in NS_PREFIXES.items():
    register_namespace(short_name, url)

Next you need to resolve the namespaced XPaths yourself into the Clark notation, which find() uses internally. For example, awis:Title resolves to {http://awis.amazonaws.com/doc/2005-07-11}Title:

def resolved_xpath(xpath, namespace):
    result = xpath
    for short_name, url in namespace.items():
        result = re.sub(r'\b' + short_name + ':', '{' + url + '}', result)
    return result

Now it is easy to write a modified find() and findall() that honors namespaces even with Python 2.6:

def find_with_namespace(element, xpath, namespace):
    return element.find(resolved_xpath(xpath, namespace))

def findall_with_namespace(element, xpath, namespace):
    return element.findall(resolved_xpath(xpath, namespace))

Your example can be implemented as:

NS_PREFIXES = {
    "alexa": "http://alexa.amazonaws.com/doc/2005-10-05/",
    "awis": "http://awis.amazonaws.com/doc/2005-07-11",
}

tree = api.sites_linking_in(domain + ".eu", count=10, start=0)
alexa_sites_linkin_in = {}
for element in findall_with_namespace(tree, '//awis:SitesLinkingIn/awis:Site',NS_PREFIXES):
    title = find_with_namespace(element, 'awis:Title', NS_PREFIXES).text
    url = find_with_namespace(element, 'awis:Url', NS_PREFIXES).text
    alexa_sites_linkin_in[title] = url

So, yes, if possible, use lxml.

**Alex Lisovoy** · Accepted Answer · 2014-12-01T11:39:48+00:00

Alex Lisovoy On 01 December 2014 at 11:39 BEST ANSWER

The api used lxml(ElementTree as backport) for parsing xml. The lxml allowed additional argument - namespace, but ElementTree does not allow. That is problem. So as hotfix I recommend install lxml.

TechQA.

Error when using XML API findall() in with python 2.6

There are 2 answers

Related Questions in PYTHON

Related Questions in XML

Related Questions in ALEXA-INTERNET

Popular Questions

Trending Questions