I have a huge webpage, which is about 5G size. And I hope I could read the content of the webpage directly(remotely) without downloading the whole file. I have used the Open
File Handler to open the HTTP content. But the error message given is No such files or directory
. I tried to use LWP::Simple
, but it was out of memory if I use get
the whole content. I wonder if there is a way that I could open
this content remotely, and read line by line.
Thank you for your help.
About Perl reading the webpage online via HTTP
344 views Asked by Chris Andrews At
2
There are 2 answers
2
On
You could try using LWP::UserAgent. The request
method allows you to specify a CODE reference, which would let you process the data as it's coming in.
#!/usr/bin/perl -w
use strict;
use warnings;
use LWP::UserAgent ();
use HTTP::Request ();
my $request = HTTP::Request->new(GET => 'http://www.example.com/');
my $ua = LWP::UserAgent->new();
$ua->request($request, sub {
my ($chunk, $res) = @_;
print $chunk;
return undef;
});
Technically the function should return the content instead of undef, but it seems to work if you return undef. According to the documentation:
The "content" function should return the content when called. The content function will be invoked repeatedly until it return an empty string to signal that there is no more content.
I haven't tried this on a large file, and you would need to write your own code to handle the data coming in as arbitrarily sized chunks.
This Perl code will download file from URL with possible continuation if file was already partially downloaded.
This code requires that server returns file size (aka
content-length
) onHEAD
request, and also requires that server supports byte ranges on URL in question.If you want some special processing for next chunk, just override it below: