xpath matching wrong node

Question

xpath matching wrong node

67 views Asked by anupamGak At 24 June 2015 at 12:10

The xpath

//*[h1]

shows different results when tried on python and Firebug. My code:

import requests
from lxml import html

url = "http://machinelearningmastery.com/naive-bayes-classifier-scratch-python/"
resp = requests.get(url)
page = html.fromstring(resp.content)

node = page.xpath("//*[h1]")
print node
#[<Element center at 0x7fb42143c7e0>]

But Firebug matches to a <header> tag which is what I desire.

Why is this so? How do i make my python code match <header> too?

Original Q&A

There are 1 answers

**Anzel** · Accepted Answer · 2015-06-24T12:33:00+00:00

You are missing the User-Agent header and hence the response content returned 403 Forbidden, add it to request and it works as expected:

In [9]: resp = requests.get(url, headers={"User-Agent": "Test Agent"})

In [10]: page = html.fromstring(resp.content)

In [11]: node = page.xpath("//*[h1]")

In [12]: print node
[<Element header at 0x104ff15d0>]

TechQA.

xpath matching wrong node

There are 1 answers

Related Questions in PYTHON-2.7

Related Questions in XPATH

Related Questions in LXML.HTML

Popular Questions

Popular Tags

Trending Questions