The xpath
//*[h1]
shows different results when tried on python and Firebug. My code:
import requests
from lxml import html
url = "http://machinelearningmastery.com/naive-bayes-classifier-scratch-python/"
resp = requests.get(url)
page = html.fromstring(resp.content)
node = page.xpath("//*[h1]")
print node
#[<Element center at 0x7fb42143c7e0>]
But Firebug matches to a <header>
tag which is what I desire.
Why is this so? How do i make my python code match <header>
too?
You are missing the User-Agent header and hence the response content returned 403 Forbidden, add it to request and it works as expected: