xpath matching wrong node

66 views Asked by At

The xpath

//*[h1]

shows different results when tried on python and Firebug. My code:

import requests
from lxml import html

url = "http://machinelearningmastery.com/naive-bayes-classifier-scratch-python/"
resp = requests.get(url)
page = html.fromstring(resp.content)

node = page.xpath("//*[h1]")
print node
#[<Element center at 0x7fb42143c7e0>]

But Firebug matches to a <header> tag which is what I desire.

Why is this so? How do i make my python code match <header> too?

1

There are 1 answers

2
Anzel On BEST ANSWER

You are missing the User-Agent header and hence the response content returned 403 Forbidden, add it to request and it works as expected:

In [9]: resp = requests.get(url, headers={"User-Agent": "Test Agent"})

In [10]: page = html.fromstring(resp.content)

In [11]: node = page.xpath("//*[h1]")

In [12]: print node
[<Element header at 0x104ff15d0>]