I am trying to download a file from a web page.
First I get the links with the linkextractor and then I want to download them with the lwp I'm a newbie programming in perl.
I made the following code ...
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TableExtract;
use HTML::LinkExtractor;
use LWP::Simple qw(get);
use Archive::Zip;
my $html = get $ARGV[0];
my $te = HTML::TableExtract->new(
keep_html => 1,
headers => [qw( column1 column2 )],
);
$te->parse($html);
# I get only the first row
my ($row) = $te->rows;
my $LXM = new HTML::LinkExtractor(undef,undef,1);
$LXM->parse(\$$row[0]);
my ($t) = $LXM->links;
my $LXS = new HTML::LinkExtractor(undef,undef,1);
$LXS->parse(\$$row[1]);
my ($s) = $LXS->links;
#-------
for (my $i=0; $i < scalar(@$s); $i++) {
print "$$s[$i]{_TEXT} $$s[$i]{href} $$t[$i]{href} \n";
my $file = '/tmp/$$s[$i]{_TEXT}';
my $url = $$s[$i]{href};
my $content = getstore($url, $file);
die "Couldn't get it!" unless defined $content;
}
And I get the following error
Undefined subroutine &main::getstore called at ./geturlfromtable.pl line 35.
Thanks in advance!
LWP::Simple can be loaded in two different ways.
This loads the module and makes all of its functions available to your program.
This loads the module and only makes available the specific set of functions you have requested.
You have this code:
This makes the
get()function available, but not thegetstore()function.To fix this, either add
getstore()to your list of functions.Or (probably simpler) remove the list of functions.
Update: I hope you don't mind if I add a couple of style points.
Firstly, you're using a really old module - HTML::LinkExtractor. It hasn't been updated for almost fifteen years. I'd recommend looking at HTML::LinkExtor instead.
Secondly, your code uses a lot of references, but you're using them in a really over-complicated way. For example, where you have
\$$row[0], you really only need$row->[0]. Similarly,$$s[$i]{href}will be easy for most people to understand if written as$s->[$i]{href}.Next, you use the C-style for loop and iterate over the array's indexes. It's usually simpler to use
foreachto iterate from zero to the last index in the array.And finally, you seem slightly confused about what
getstore()returns. It returns the HTTP response code. So it will never be undefined. If there's a problem retrieving the content, you'll get 500 or 403 or something like that.