I can parse a web page in python when accessing the url via requests package however I’d like to parse this same web page in python when accessing from a file using the same bulk of parsing code but having an issue with processing from file , getting the top parent object into the same type as the web request object.
I can parse a web page in python when accessing the url via requests package thus:
xreq = requests.get('some url')
xreqtree = html.fromstring(xreq.content)
However I’d like to parse this same web page in python when accessing from a file doing thus:
f = open("someurlsfile.html", "r", encoding="utf-8")
xftreetmp = etree.parse(f)
f.close
xftree = xftreetmp.getroot()
At this point, both start at root elements html however the request is of type lxml.html.HtmlElement
but the file is of type lxml.etree._Element
. I would like to open the file and get it into the lxml.html.HtmlElement
type to that I can have one set of parsing code for the page.