Let say I have code like this:
<table>
<tr>
<td colspan=2>Date</td>
</tr>
<tr id='something'>
<td>8 september</td>
<td>2008</td>
</tr>
</table>
I want to extract the date to have "8 september 2008".
On
You could collect the text from each td element, and join them with ' '.join(...):
import lxml.html as LH
content = '''
<table>
<tr>
<td colspan=2>Date</td>
</tr>
<tr id='something'>
<td>8 september</td>
<td>2008</td>
</tr>
</table>
'''
doc = LH.fromstring(content)
date = ' '.join(td.text for td in doc.xpath('//table/tr[@id = "something"]/td'))
print(date)
yields
8 september 2008
Or, if you can handle the carriage returns, you could use the text_content() method:
for td in doc.xpath('//table/tr[@id = "something"]'):
print(td.text_content())
yields
8 september
2008
A pure XPath 1.0 solution.
Use: