pugixml xpath -- node not found

763 views Asked by At

I'm using pugixml's xpath functions to find certain nodes within a html document (downloaded through curl).

I am using:

pugi::xml_document doc;


doc.load_buffer(htmlcontent.c_str(), htmlcontent.size());

pugi::xpath_node example= doc.select_single_node("//h2[@class='tv_header']");
std::cout << example.node();

which returns 0 nodes. I know that this node exists in the document. I've put just that node within a string and it finds the node successfully. Why is the node not found within the document? Is there some issue with encoding of the html document?

Thanks!

1

There are 1 answers

1
zeuxcg On BEST ANSWER

It is likely that the parsing of your document stops before encountering the node.

HTML documents generally can not be parsed by XML parsers; unless your document is a valid XHTML document you need to use an HTML parser.

To verify this, just look at the result object that's returned by load_buffer - i.e.

pugi::xml_parse_result res = doc.load_buffer(htmlcontent.c_str(), htmlcontent.size());

std::cout << "Parsing result: " << res.description() << std::endl;
if (!res) std::cout << "Parsing stopped at offset " << res.offset << std::endl;