Categorize book authors as fiction vs non-fiction

584 views Asked by At

For my own personal purposes, I have about ~300 authors (full name) of various books. I want to partition this list into "fiction authors" and "non-fiction authors". If an author writes both, then the majority gets the vote.

I looked at Amazon Product Search API: I can search by author (in Python), but there is no way to find the book category (fiction vs rest):

>>> node = api.item_search('Books', Author='Richard Dawkins')
>>> for book in node.Items.Item:
...     print book.ItemAttributes.Title

What are my options? I prefer to do this in Python.

3

There are 3 answers

1
Maxym On BEST ANSWER

Well, you can try another service - Google Book Search API. To use Python you can have a look at gdata-python-api. In its protocol, in result feed there is a node <dc:subject> - probably that's what you need:

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/"
      xmlns:gbs="http://schemas.google.com/books/2008" 
      xmlns:dc="http://purl.org/dc/terms"
      xmlns:gd="http://schemas.google.com/g/2005">
  <id>http://www.google.com/books/feeds/volumes</id>
  <updated>2008-08-12T23:25:35.000</updated>

<!--  a loot of information here, just removed those nodes to save space.. -->

    <dc:creator>Jane Austen</dc:creator>
    <dc:creator>James Kinsley</dc:creator>
    <dc:creator>Fiona Stafford</dc:creator>
    <dc:date>2004</dc:date>
    <dc:description>
      If a truth universally acknowledged can shrink quite so rapidly into 
      the opinion of a somewhat obsessive comic character, the reader may reasonably feel ...
    </dc:description>
    <dc:format>382</dc:format>
    <dc:identifier>8cp-Z_G42g4C</dc:identifier>
    <dc:identifier>ISBN:0192802380</dc:identifier>
    <dc:publisher>Oxford University Press, USA</dc:publisher>
    <dc:subject>Fiction</dc:subject>
    <dc:title>Pride and Prejudice</dc:title>
    <dc:title>A Novel</dc:title>
  </entry>
</feed>

Of course, this protocol gives you some overhead information, related to this book (like visible or not on Google Books etc.)

1
Reiner Gerecke On

Did you look at BrowseNodes? To me (who has not been using this API before) it seems BrowseNodes correspond to Amazon's product categories. Maybe you find more information there.

0
thomas On

After spending some time messing with the Amazon API it looks like they don't provide the kind of information you want.

They don't mention categories of that type in their documentation and if you serialise the stuff the api sends you there is not a single mention of fiction or non-fiction catergories.

You can use this to print out a nice XML string (you might want to direct it at a file for easy reading) with all of the stuff the api sends.

from lxml import etree

node = api.item_search('Books', Author='Richard Dawkins')

print etree.tostring(node, pretty_print=True)