I am using WWW::Mechanize, HTML::TreeBuilder and HTML::Element in my perl-script to navigate through html-Documents.
I want to know how to search for an element, that contains a certain string as text.
Here is an example of an html-document:
<html>
<body>
<ul>
<li>
<div class="red">Apple</div>
<div class="abc">figure = triangle</div>
</li>
<li>
<div class="red">Banana</div>
<div class="abc">figure = square</div>
</li>
<li>
<div class="green">Lemon</div>
<div class="abc">figure = circle</div>
</li>
<li>
<div class="blue">Banana</div>
<div class="abc">figure = line</div>
</li>
</ul>
</body>
</html>
I want to extract the text square
. To get it, I have to search for an element with this properties:
- tag-name is "div"
- class is "red"
- content is text "Banana"
Then I need to get it's parent (a <li>
-element), and from the parent the child who's text starts with figure =
, but this, and the rest, is easy.
I tried it this way:
use strict;
use warnings;
use utf8;
use Encode;
use WWW::Mechanize;
use HTML::TreeBuilder;
use HTML::Element;
binmode STDOUT, ":utf8";
my $mech = WWW::Mechanize->new();
my $uri = 'http.....'; #URI of an existing html-document
$mech->get($uri);
if (($mech->success()) && ($mech->is_html())) {
my $resp = $mech->response();
my $cont = $resp->decoded_content;
my $root = HTML::TreeBuilder->new_from_content($cont);
#this works, but returns 2 elements:
my @twoElements = $root->look_down('_tag' => 'div', 'class' => 'red');
#this returns an empty list:
my @empty = $root->look_down('_tag' => 'div', 'class' => 'red', '_content' => 'Banana');
# do something with @twoElements or @empty
}
What must I use instead the last command to get the wanted element?
I am not looking for a workaround (I've found one). What I want to have is a native function of WWW::Mechanize, HTML::Tree or any other cpan-modul.
here's psuedocode/unttested Perl:
Not perfect, but it should get you started, and it's general enough to reuse easily. otherwise replace
with something like
This might be a little cleaner:
my @elements = $root->look_down('_tag' => 'div', 'class' => 'red' ); foreach my $e ( @elements ) { next unless $e->as_trimmed_text eq 'Banana'; my $e2 = $e->right; my ($shape) = $e2->as_trimmed_text =~ /figure = (.+)/;
WWW::Mechanize::TreeBuilder