GENERAL IDEA
Here is a snippet of what I'm working with:
my $url_temp;
my $page_temp;
my $p_temp;
my @temp_stuff;
my @collector;
foreach (@blarg_links) {
$url_temp = $_;
$page_temp = get( $url_temp ) or die $!;
$p_temp = HTML::TreeBuilder->new_from_content( $page_temp );
@temp_stuff = $p_temp->look_down(
_tag => 'foo',
class => 'bar'
);
foreach (@temp_stuff) {
push(@collector, "http://www.foobar.sx" . $1) if $_->as_HTML =~ m/href="(.*?)"/;
};
};
Hopefully it is clear that what I'm hopelessly trying to do is push the link endings found in each of a list of links into an array called @temp_stuff
. So the first link in @blarg_links
, when visited, has greater than or equal to 1 foo
tag with an associated bar
class that when acted on by as_HTML
will match something I want in the href
equality to then pump into an array of links which have the data I'm really after... Does that make sense?
ACTUAL DATA
my $url2 = 'http://www.chemistry.ucla.edu/calendar-node-field-date/year';
my $page2 = get( $url2 ) or die $!;
my $p2 = HTML::TreeBuilder->new_from_content( $page2 );
my @stuff2 = $p2->look_down(
_tag => 'div',
class => 'year mini-day-on'
);
my @chem_links;
foreach (@stuff2) {
push(@chem_links, $1) if $_->as_HTML =~ m/(http:\/\/www\.chemistry\.ucla\.edu\/calendar-node-field-date\/day\/[0-9]{4}-[0-9]{2}-[0-9]{2})/;
};
my $url_temp;
my $page_temp;
my $p_temp;
my @temp_stuff;
my @collector;
foreach (@chem_links) {
$url_temp = $_;
$page_temp = get( $url_temp ) or die $!;
$p_temp = HTML::TreeBuilder->new_from_content( $page_temp );
@temp_stuff = $p_temp->look_down(
_tag => 'span',
class => 'field-content'
);
};
foreach (@temp_stuff) {
push(@collector, "http://www.chemistry.ucla.edu" . $1) if $_->as_HTML =~ m/href="(.*?)"/;
};
n.b. - I want to use HTML::TreeBuilder. I'm aware of alternatives.
This is a rough attempt at what I think you want.
It fetches all the links on the first page and visits each of them in turn, printing the link in each
<span class="field-content">
element.