Perl HTML::TreeBuilder adding <html>, <head> and <body> tags to parsed content, how to stop or work around it?

Question

Perl HTML::TreeBuilder adding <html>, <head> and <body> tags to parsed content, how to stop or work around it?

858 views Asked by s2cuts At 12 October 2011 at 17:05

Background:
I'm using HTML::TreeBuilder to parse an entire html page, say "whole_page" for reference's sake. I'm then using the inherited parse_content method (same as for whole_page) of a new TreeBuilder object to to parse a chunk of html, say "html_to_insert". The root element of html_to_insert should be a <div> tag. Ultimately, the html_to_insert tree needs to be inserted into the the whole_page tree.

Problem:
The html_to_insert tree is being wrapped with <html>, <head> and <body> tags, which I obviously don't need. I looked at HTML::Parser to see if there was a parameter that might solve the problem, but I couldn't find anything.

Question:
Is there a simple way to stop the parse method from wrapping html_to_insert with the un-needed tags? Knowing what I'm trying to do, am I doing this ass backwards (is there a better way)?

Thanks for any help.

Original Q&A

There are 2 answers

**Tanktalus** · Answer 1 · 2011-10-12T17:14:26+00:00

If you can ensure your HTML is XHTML-compliant, that is, it's a proper XML document, you may be able to use XML tools to do the job instead. In the past, I've used XML::Twig for this type of job, it was a bit easier that way.

Of course, if you're parsing arbitrary web pages from the internet, you may not have this type of guarantee.

**bvr** · Answer 2 · 2011-10-12T17:31:39+00:00

bvr On 12 October 2011 at 17:31

You might want to look on guts method in HTML::Tree. It returns only non-implicit nodes as a list.

TechQA.

Perl HTML::TreeBuilder adding <html>, <head> and <body> tags to parsed content, how to stop or work around it?

There are 2 answers

Related Questions in HTML

Related Questions in PERL

Related Questions in PARSING

Related Questions in HTML-TREE

Popular Questions

Popular Tags

Trending Questions