Querying a website with Perl LWP::Simple to Process Online Prices

1.5k views Asked by At

In my free time, I've been trying to improve my perl abilities by working on a script that uses LWP::Simple to poll one specific website's product pages to check the prices of products (I'm somewhat of a perl noob). This script also keeps a very simple backlog of the last price seen for that item (since the prices change frequently).

I was wondering if there was any way I could further automate the script so that I don't have to explicitly add the page's URL to the initial hash (i.e. keep an array of key terms and do a search query amazon to find the page or price?). Is there anyway way I could do this that doesn't involve me just copying Amazon's search URL and parsing in my keywords? (I'm aware that processing HTML with regex is generally bad form, I just used it since I only need one small piece of data).


#!usr/bin/perl
use strict;
use warnings;
use LWP::Simple;

my %oldPrice;
my %nameURL = (
    "Archer Season 1" => "http://www.amazon.com/Archer-Season-H-Jon-Benjamin/dp/B00475B0G2/ref=sr_1_1?ie=UTF8&qid=1297282236&sr=8-1",
    "Code Complete" => "http://www.amazon.com/Code-Complete-Practical-Handbook-Construction/dp/0735619670/ref=sr_1_1?ie=UTF8&qid=1296841986&sr=8-1",
    "Intermediate Perl" => "http://www.amazon.com/Intermediate-Perl-Randal-L-Schwartz/dp/0596102062/ref=sr_1_1?s=books&ie=UTF8&qid=1297283720&sr=1-1",
    "Inglorious Basterds (2-Disc)" => "http://www.amazon.com/Inglourious-Basterds-Two-Disc-Special-Brad/dp/B002T9H2LK/ref=sr_1_3?ie=UTF8&qid=1297283816&sr=8-3"
);

if (-e "backlog.txt"){
    open (LOG, "backlog.txt");
    while(){
        chomp;
        my @temp = split(/:\s/);
        $oldPrice{$temp[0]} = $temp[1];
    }
close(LOG);
}

print "\nChecking Daily Amazon Prices:\n";
open(LOG, ">backlog.txt");
foreach my $key (sort keys %nameURL){
    my $content = get $nameURL{$key} or die;
    $content =~  m{\s*\$(\d+.\d+)} || die;
    if (exists $oldPrice{$key} && $oldPrice{$key} != $1){
        print "$key: \$$1 (Was $oldPrice{$key})\n";
    }
    else{
    print "\n$key: $1\n";
    }
    print LOG "$key: $1\n";
}
close(LOG);
2

There are 2 answers

2
bvr On BEST ANSWER

I made simple script to demonstate Amazon search automation. Search url for all departments was changed with escaped search term. The rest of code is simple parsing with HTML::TreeBuilder. Structure of HTML in question can be easily examined with dump method (see commented-out line).

use strict; use warnings;

use LWP::Simple;
use URI::Escape;
use HTML::TreeBuilder;
use Try::Tiny;

my $look_for = "Archer Season 1";

my $contents
  = get "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords="
        . uri_escape($look_for);

my $html = HTML::TreeBuilder->new_from_content($contents);
for my $item ($html->look_down(id => qr/result_\d+/)) {
    # $item->dump;      # find out structure of HTML
    my $title = try { $item->look_down(class => 'productTitle')->as_trimmed_text };
    my $price = try { $item->look_down(class => 'newPrice')->find('span')->as_text };

    print "$title\n$price\n\n";
}
$html->delete;
2
daxim On

Yes, the design can be improved. It's probably best to delete everything and start over with an existing full-featured web scraping application or framework, but since you want to learn:

  1. The name-to-URL map is configuration data. Retrieve it from outside of the program.
  2. Store the historic data in a database.
  3. Learn XPath and use it to extract data from HTML, it's easy if you already grok CSS selectors.

Other stackers, if you want to amend my post with the rationale for each piece of advice, go ahead and edit it.