How can I replace the ISBN with the Google Books ID in a MARC file, using Perl?

Question

How can I replace the ISBN with the Google Books ID in a MARC file, using Perl?

1.3k views Asked by l0b0 At 03 November 2009 at 15:11

I've got a file with some book data in MARC format, of which some lines are ISBNs. I'd like to replace these lines with the Google Books ID of that ISBN, if it exists. Here's the code so far, which just ends up removing the lines:

perl -pe "s#ISBN(.*)#$(wget --output-document=- --quiet --user-agent=Mozilla/5.0 \"http://books.google.com/books?jscmd=viewapi&bibkeys=\1\")#mg" < 5-${file} > 6-${file}

PS: Google are a bit fuzzy on the use of automated tools: The Books Data API recommends tools like curl / wget, but there are no instructions on how to avoid being blocked when using such tools. I'm also pretty sure I saw a clause in a ToS saying users can't send automated queries, but I can't find it again. This is discussed in their forum.

Original Q&A

There are 2 answers

Sinan Ünür On 03 November 2009 at 17:09

The reason you end up having to lie about the user agent is because you are violating Google's TOS: Don't do that.

Instead, use the Google Book Search API.

The code below is slightly hampered by my lack of familiarity with modules such as XML::Atom, Data::Feed, WWW::OpenSearch. However, it should provide a good starting point.

#!/usr/bin/perl

use strict;
use warnings;

use Business::ISBN qw( valid_isbn_checksum );
use LWP::Simple;
use XML::Simple;

while ( <> ) {
    s/ISBN:([0-9]+)/'Google Books ID:' . get_google_id_for_isbn($1)/ge;
    print;
}

use Carp;

sub make_google_books_query {
    sprintf 'http://books.google.com/books/feeds/volumes?q=isbn:%s', $_[0];
}

sub get_google_id_for_isbn {
    my ($isbn) = @_;

    my $google_id = eval {
        defined(valid_isbn_checksum $isbn)
            or croak "Invalid ISBN: $isbn";

        my $query = make_google_books_query($isbn);
        my $xml = get $query;

        defined($xml)
            or croak "No response to <$query>";

        my $data = XMLin($xml, ForceArray => 1);
        my @ids = @{ $data->{entry}[0]{'dc:identifier'} };

        unless ("ISBN:$isbn" eq $ids[1]
                or "ISBN:$isbn" eq $ids[2] ) {
            croak "Invalid search results: '@ids'";
        }

        $ids[0];
    };

    defined($google_id) ? $google_id : '';
}

Given a text file t.txt containing:

ISBN:0060930314
ISBN:9780596520106

it outputs:

Google Books ID:ioXFqlzsmK8C
Google Books ID:lNVHi3TunxsC

**mob** · Accepted Answer · 2009-11-03T15:59:15+00:00

I think the OP is on the right track and could use a one-liner for this, and just needs to replace some bash-style syntax with the correct Perl syntax. I think this would work (newlines added for readability):

    perl -pe 's#ISBN(\w+)#qx(wget --output-document=- 
        --quiet --user-agent=Mozilla/5.0 
        "http://books.google.com/books\\?jscmd=viewapi\\&bibkeys=$1")#ge' \
        < 5-${file} > 6-${file}

You have to escape (edit: double escaping seems to work) the $ or & characters in the url.

TechQA.

How can I replace the ISBN with the Google Books ID in a MARC file, using Perl?

There are 2 answers

Related Questions in PERL

Related Questions in ISBN

Related Questions in GOOGLE-BOOKS

Related Questions in MARC

Popular Questions

Popular Tags

Trending Questions