xpath regex doesn't search tail in lxml.etree

Question

xpath regex doesn't search tail in lxml.etree

629 views Asked by ngoue At 11 June 2015 at 20:44

I'm working with lxml.etree and I'm trying to allow users to search a docbook for text. When a user provides the search text, I use the exslt match function to find the text within the docbook. The match works just fine if the text shows up within the element.text but not if the text is in element.tail.

Here's an example:

>>> # XML as lxml.etree element
>>> root = lxml.etree.fromstring('''
...   <root>
...     <foo>Sample text
...       <bar>and more sample text</bar> and important text.
...     </foo>
...   </root>
... ''')
>>>
>>> # User provides search text    
>>> search_term = 'important'
>>>
>>> # Find nodes with matching text
>>> matches = root.xpath('//*[re:match(text(), $search, "i")]', search=search_term, namespaces={'re':'http://exslt.org/regular-expressions'})
>>> print(matches)
[]
>>>
>>> # But I know it's there...
>>> bar = root.xpath('//bar')[0]
>>> print(bar.tail)
 and important text.

I'm confused because the text() function by itself returns all the text – including the tail:

>>> # text() results
>>> text = root.xpath('//child1/text()')
>>> print(text)
['Sample text',' and important text']

How come the tail isn't being included when I use the match function?

Original Q&A

There are 1 answers

**har07** · Answer 1 · 2015-06-12T00:52:17+00:00

How come the tail isn't being included when I use the match function?

That's because in xpath 1.0, when given a node-set, match() function (or any other string function such as contains(), starts-with(), etc.) only take into account the first node.

Instead of what you did, you can use //text() and apply regex match filter on individual text nodes, and then return the text node's parent element, like so :

xpath = '//text()[re:match(., $search, "i")]/parent::*'
matches = root.xpath(xpath, search=search_term, namespaces={'re':'http://exslt.org/regular-expressions'})

TechQA.

xpath regex doesn't search tail in lxml.etree

There are 1 answers

Related Questions in PYTHON

Related Questions in REGEX

Related Questions in XPATH

Related Questions in LXML

Popular Questions

Popular Tags

Trending Questions