Error when using XML API findall() in with python 2.6

1.1k views Asked by At

I'm using this code below to retrieve information from the Alexa API, this code works well on Python 2.7 but I have to use Python 2.6 and it gives me an error:

findall() takes exactly 2 arguments (3 given)

I presume that this method has change in Python 2.7 but I don't know how to make it work in 2.6.

NS_PREFIXES = {
    "alexa": "http://alexa.amazonaws.com/doc/2005-10-05/",
    "awis": "http://awis.amazonaws.com/doc/2005-07-11",
}

tree = api.sites_linking_in(domain + ".eu", count=10, start=0)
alexa_sites_linkin_in = {}
for element in tree.findall('//awis:SitesLinkingIn/awis:Site',NS_PREFIXES):
    alexa_sites_linkin_in.update({
    element.find('awis:Title', NS_PREFIXES).text: element.find('awis:Url', "awis").text})
2

There are 2 answers

0
Alex Lisovoy On BEST ANSWER

The api used lxml(ElementTree as backport) for parsing xml. The lxml allowed additional argument - namespace, but ElementTree does not allow. That is problem. So as hotfix I recommend install lxml.

0
roskakori On

With Python 2.6 (and earlier) you need to register the namespace manually and resolve it into the Clark notation before for find() is able to recognize them.

First, register the namespace as describe at http://effbot.org/zone/element-namespaces.htm:

from xml import ElementTree
try:
    register_namespace = ElementTree.register_namespace
except AttributeError:
    def register_namespace(prefix, uri):
        ElementTree._namespace_map[uri] = prefix

for short_name, url in NS_PREFIXES.items():
    register_namespace(short_name, url)

Next you need to resolve the namespaced XPaths yourself into the Clark notation, which find() uses internally. For example, awis:Title resolves to {http://awis.amazonaws.com/doc/2005-07-11}Title:

def resolved_xpath(xpath, namespace):
    result = xpath
    for short_name, url in namespace.items():
        result = re.sub(r'\b' + short_name + ':', '{' + url + '}', result)
    return result

Now it is easy to write a modified find() and findall() that honors namespaces even with Python 2.6:

def find_with_namespace(element, xpath, namespace):
    return element.find(resolved_xpath(xpath, namespace))

def findall_with_namespace(element, xpath, namespace):
    return element.findall(resolved_xpath(xpath, namespace))

Your example can be implemented as:

NS_PREFIXES = {
    "alexa": "http://alexa.amazonaws.com/doc/2005-10-05/",
    "awis": "http://awis.amazonaws.com/doc/2005-07-11",
}

tree = api.sites_linking_in(domain + ".eu", count=10, start=0)
alexa_sites_linkin_in = {}
for element in findall_with_namespace(tree, '//awis:SitesLinkingIn/awis:Site',NS_PREFIXES):
    title = find_with_namespace(element, 'awis:Title', NS_PREFIXES).text
    url = find_with_namespace(element, 'awis:Url', NS_PREFIXES).text
    alexa_sites_linkin_in[title] = url

So, yes, if possible, use lxml.