Background:
I'm using HTML::TreeBuilder to parse an entire html page, say "whole_page" for reference's sake. I'm then using the inherited parse_content method (same as for whole_page) of a new TreeBuilder object to to parse a chunk of html, say "html_to_insert". The root element of html_to_insert should be a <div>
tag. Ultimately, the html_to_insert tree needs to be inserted into the the whole_page tree.
Problem:
The html_to_insert tree is being wrapped with <html>
, <head>
and <body>
tags, which I obviously don't need. I looked at HTML::Parser to see if there was a parameter that might solve the problem, but I couldn't find anything.
Question:
Is there a simple way to stop the parse method from wrapping html_to_insert with the un-needed tags? Knowing what I'm trying to do, am I doing this ass backwards (is there a better way)?
Thanks for any help.
If you can ensure your HTML is XHTML-compliant, that is, it's a proper XML document, you may be able to use XML tools to do the job instead. In the past, I've used XML::Twig for this type of job, it was a bit easier that way.
Of course, if you're parsing arbitrary web pages from the internet, you may not have this type of guarantee.