Scraperwiki: how to save data into one cell in table

Question

Scraperwiki: how to save data into one cell in table

197 views Asked by user2662750 At 08 August 2013 at 01:12

Here is my code for the scraper that is extracting the URL and corresponding comments from that particular page:

import scraperwiki
import lxml.html
from BeautifulSoup import BeautifulSoup
import urllib2
import re

for num in range(1,2):
    html_page = urllib2.urlopen("https://success.salesforce.com/ideaSearch?keywords=error&pageNo="+str(num))
    soup = BeautifulSoup(html_page)
    for i in range(0,10):
        for link in soup.findAll('a',{'id':'search:ForumLayout:searchForm:itemObj2:'+str(i)+':idea:recentIdeasComponent:profileIdeaTitle'}):
             pageurl = link.get('href')
             html = scraperwiki.scrape(pageurl)
             root = lxml.html.fromstring(html)

             for j in range(0,300):
                 for table in root.cssselect("span[id='ideaView:ForumLayout:ideaViewForm:cmtComp:ideaComments:cmtLoop:"+str(j)+":commentBodyOutput'] table"):
                     divx = table.cssselect("div[class='htmlDetailElementDiv']")
                     if len(divx)==1:
                         data = {
                             'URL' : pageurl,
                             'Comment' : divx[0].text_content()
                         }
                         print data


         scraperwiki.sqlite.save(unique_keys=['URL'], data=data)
         scraperwiki.sqlite.save(unique_keys=['Comment'], data=data)

When the data is saved to the scraperwiki datastore only the last comment from one URL is put into the table. What I would like is in the table for each URL to have all the comments saved. So, in one column there is the URL and in the second column there are all the comments from that URL, instead of just the last comment, which is what this code ends up with.

Original Q&A

There are 1 answers

**zhangyangyu** · Answer 1 · 2013-08-08T01:24:39+00:00

As I can see from your code, you put the data in the most inner for loop and assign it a new value every time. So when the for loop ends and goes to the save step, data will contain the last comment. I think you may use:

for i in range(0,10):
        for link in soup.findAll('a',{'id':'search:ForumLayout:searchForm:itemObj2:'+str(i)+':idea:recentIdeasComponent:profileIdeaTitle'}):
             pageurl = link.get('href')
             html = scraperwiki.scrape(pageurl)
             root = lxml.html.fromstring(html)
             data = {'URL': pageurl, 'Comment':[]}

             for j in range(0,300):
                 for table in root.cssselect("span[id='ideaView:ForumLayout:ideaViewForm:cmtComp:ideaComments:cmtLoop:"+str(j)+":commentBodyOutput'] table"):
                     divx = table.cssselect("div[class='htmlDetailElementDiv']")
                     if len(divx)==1:
                         data['Comment'].append(divx[0].text_content)

         scraperwiki.sqlite.save(unique_keys=['URL'], data=data)
         scraperwiki.sqlite.save(unique_keys=['Comment'], data=data)

TechQA.

Scraperwiki: how to save data into one cell in table

There are 1 answers

Related Questions in PYTHON

Related Questions in SQL

Related Questions in BEAUTIFULSOUP

Related Questions in SCRAPERWIKI

Popular Questions

Popular Tags

Trending Questions