Let say I have code like this:
<table>
<tr>
<td colspan=2>Date</td>
</tr>
<tr id='something'>
<td>8 september</td>
<td>2008</td>
</tr>
</table>
I want to extract the date to have "8 september 2008".
You could collect the text from each td
element, and join them with ' '.join(...)
:
import lxml.html as LH
content = '''
<table>
<tr>
<td colspan=2>Date</td>
</tr>
<tr id='something'>
<td>8 september</td>
<td>2008</td>
</tr>
</table>
'''
doc = LH.fromstring(content)
date = ' '.join(td.text for td in doc.xpath('//table/tr[@id = "something"]/td'))
print(date)
yields
8 september 2008
Or, if you can handle the carriage returns, you could use the text_content()
method:
for td in doc.xpath('//table/tr[@id = "something"]'):
print(td.text_content())
yields
8 september
2008
A pure XPath 1.0 solution.
Use: