Beautiful soup and bottlenose, how to parse correctly

Question

Beautiful soup and bottlenose, how to parse correctly

779 views Asked by Astro David At 05 February 2016 at 12:38

I am currently trying to extract strings from the response of a bottlenose amazon api request. Without wanting to cause Russian hackers to pwn to my webapp, I am trying to use beautiful soup following this small webpage as guide.

My current code:

import bottlenose as BN
import lxml
from bs4 import BeautifulSoup

amazon = BN.Amazon('MyAmznID','MyAmznSK','MyAmznAssTag',Region='UK', Parser=BeautifulSoup)
rank = amazon.ItemLookup(ItemId="0198596790",ResponseGroup="SalesRank")

soup = BeautifulSoup(rank)

print rank
print soup.find('SalesRank').string

This is the current output from bottlenose looks like this:

<?xml version="1.0" ?><html><body><itemlookupresponse xmlns="http://webservices.amazon.com/AWSECommerceService/2011-08-01"><operationrequest><httpheaders><header name="UserAgent" value="Python-urllib/2.7"></header></httpheaders><requestid>53f15ff4-3588-4e63-af6f-279bddc7c243</requestid><arguments><argument name="AWSAccessKeyId" value="################"></argument><argument name="AssociateTag" value="#########-##"></argument><argument name="ItemId" value="0198596790"></argument><argument name="Operation" value="ItemLookup"></argument><argument name="ResponseGroup" value="SalesRank"></argument><argument name="Service" value="AWSECommerceService"></argument><argument name="Timestamp" value="2016-02-04T11:05:48Z"></argument><argument name="Version" value="2011-08-01"></argument><argument name="Signature" value="################+##################="></argument></arguments><requestprocessingtime>0.0234130000000000</requestprocessingtime></operationrequest><items><request><isvalid>True</isvalid><itemlookuprequest><idtype>ASIN</idtype><itemid>0198596790</itemid><responsegroup>SalesRank</responsegroup><variationpage>All</variationpage></itemlookuprequest></request><item><asin>0198596790</asin><salesrank>124435</salesrank></item></items></itemlookupresponse></body></html>

So the bottle nose section works but the soup section gives an error response:

Traceback (most recent call last):
File "/Users/Fuck/Documents/Amazon/Bottlenose_amzn_prog/test.py", line 12, in <module>
print soup.find(Rank).string
NameError: name 'soup' is not defined

I am trying to extract the digits between the 'SalesRank' tags, but failing.

Original Q&A

There are 2 answers

Astro David On 06 February 2016 at 10:35

Ok, so I have ignored the option to specify parser in the bottlenose line. Instead just specifying to use BeautifulSoup and xml parsing later.

import bottlenose as BN
import lxml
from bs4 import BeautifulSoup

amazon = BN.Amazon('##############','##############','##########',Region='UK')
rank = amazon.ItemLookup(ItemId="specifiedItemId",ResponseGroup="SalesRank")
soup = BeautifulSoup(rank, "xml")
print " " 
print soup.SalesRank

I am a fairly novice user of Python so sometimes its the simple things that get me.

**Brendan Quinn** · Accepted Answer · 2016-12-19T12:48:17+00:00

From looking at the code, it seems that the Bottlenose Parser option is very simple and takes a function as the parameter.

So you can just make a very simple Python function and pass it to the constructor, which makes your code look like this:

import bottlenose as BN
from bs4 import BeautifulSoup

def parse_xml(text):
    return BeautifulSoup(text, 'xml')

amazon = BN.Amazon(
    AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,
    AWS_ASSOCIATE_TAG,Region='UK', Parser=parse_xml
)
results = amazon.ItemLookup(ItemId="0198596790",ResponseGroup="SalesRank")

print results.find('SalesRank').string

Or you can use a lambda in-line function instead:

import bottlenose as BN
from bs4 import BeautifulSoup

amazon = BN.Amazon(
    AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_ASSOCIATE_TAG,
    Region='UK', Parser=lambda text: BeautifulSoup(text, 'xml')
)
results = amazon.ItemLookup(ItemId="0198596790",ResponseGroup="SalesRank")

print results.find('SalesRank').string

TechQA.

Beautiful soup and bottlenose, how to parse correctly

There are 2 answers

Related Questions in PYTHON

Related Questions in BEAUTIFULSOUP

Related Questions in BOTTLENOSE

Popular Questions

Popular Tags

Trending Questions