HTML::TreeBuilder::XPath missing last tag in result

212 views Asked by At
use WWW::Mechanize;
use HTML::TreeBuilder::XPath;
my $mech = new WWW::Mechanize;
my $tree = new HTML::TreeBuilder::XPath;
my $url = "http://www.elaws.gov.bw/wondersbtree.php";
$mech->get($url);
$tree->parse($mech->content());
@nodes = $tree->findnodes("//p[font = 'PRINCIPAL LEGISLATION']");
print @nodes[0]->as_HTML;

The above code prints out the HTML element searched for, but it is missing the final </p> tag. Why? Is this intentional or is it a bug in the module?

2

There are 2 answers

1
ikegami On

In HTML, the end tag is optional for P elements.

0
ThisSuitIsBlackNot On

By default, the as_HTML method omits certain optional end tags:

as_HTML

$s = $h->as_HTML();
$s = $h->as_HTML($entities);
$s = $h->as_HTML($entities, $indent_char);
$s = $h->as_HTML($entities, $indent_char, \%optional_end_tags);

[ ... ]

If \%optional_end_tags is specified and defined, it should be a reference to a hash that holds a true value for every tag name whose end tag is optional. Defaults to \%HTML::Element::optionalEndTag, which is an alias to %HTML::Tagset::optionalEndTag, which, at time of writing, contains true values for p, li, dt, dd. A useful value to pass is an empty hashref, {}, which means that no end-tags are optional for this dump.

For example:

use strict;
use warnings 'all';
use 5.010;

use HTML::TreeBuilder::XPath;

my $tree = HTML::TreeBuilder::XPath->new_from_content('<p>foo</p>');
my @nodes = $tree->findnodes('//p');

say $nodes[0]->as_HTML(undef, undef, {});

Output:

<p>foo</p>

Note that you should always use strict; and use warnings 'all';.