How to extract a text from multiple tags with Xpath (lxml)?

Question

How to extract a text from multiple tags with Xpath (lxml)?

1.9k views Asked by acheruns At 27 February 2012 at 22:02

Let say I have code like this:

<table>
  <tr>
    <td colspan=2>Date</td>
  </tr>
  <tr id='something'>
   <td>8 september</td>
   <td>2008</td>
  </tr>
</table>

I want to extract the date to have "8 september 2008".

Original Q&A

There are 2 answers

unutbu On 27 February 2012 at 22:40

You could collect the text from each td element, and join them with ' '.join(...):

import lxml.html as LH

content = '''
<table>
  <tr>
    <td colspan=2>Date</td>
  </tr>
  <tr id='something'>
   <td>8 september</td>
   <td>2008</td>
  </tr>
</table>
'''

doc = LH.fromstring(content)
date = ' '.join(td.text for td in doc.xpath('//table/tr[@id = "something"]/td'))
print(date)

yields

8 september 2008

Or, if you can handle the carriage returns, you could use the text_content() method:

for td in doc.xpath('//table/tr[@id = "something"]'):
    print(td.text_content())

yields

8 september
   2008

**Dimitre Novatchev** · Accepted Answer · 2012-02-27T23:02:33+00:00

Dimitre Novatchev On 27 February 2012 at 23:02 BEST ANSWER

A pure XPath 1.0 solution.

Use:

string(normalize-space(//table/tr[@id = 'something']))

TechQA.

How to extract a text from multiple tags with Xpath (lxml)?

There are 2 answers

Related Questions in PYTHON

Related Questions in XPATH

Related Questions in LXML

Popular Questions

Popular Tags

Trending Questions