Linked Questions

Popular Questions

I can parse a web page in python when accessing the url via requests package however I’d like to parse this same web page in python when accessing from a file using the same bulk of parsing code but having an issue with processing from file , getting the top parent object into the same type as the web request object.

I can parse a web page in python when accessing the url via requests package thus:

xreq  = requests.get('some url')
xreqtree = html.fromstring(xreq.content)

However I’d like to parse this same web page in python when accessing from a file doing thus:

f = open("someurlsfile.html", "r", encoding="utf-8")
xftreetmp = etree.parse(f)
f.close
xftree = xftreetmp.getroot()

At this point, both start at root elements html however the request is of type lxml.html.HtmlElement but the file is of type lxml.etree._Element. I would like to open the file and get it into the lxml.html.HtmlElement type to that I can have one set of parsing code for the page.

Related Questions