I'm trying to scrape an ASP-powered site using ScraperWiki's tools.
I want to grab a list of BBSes in a particular area code from the BBSmates.com website. The site displays 20 BBS search results at a time, so I will have to do form submits in order to move from one page of results to the next.
This blog post helped me get started. I thought the following code would grab the final page of BBS listings for the 314 area code (page 79).
However, the response I get is the FIRST page.
url = 'http://bbsmates.com/browsebbs.aspx?BBSName=&AreaCode=314'
br = mechanize.Browser()
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
response = br.open(url)
html = response.read()
br.select_form(name='aspnetForm')
br.form.set_all_readonly(False)
br['__EVENTTARGET'] = 'ctl00$ContentPlaceHolder1$GridView1'
br['__EVENTARGUMENT'] = 'Page$79'
print br.form
response2 = br.submit()
html2 = response2.read()
print html2
The blog post I cited above mentions that in their case there was a problem with a SubmitControl, so I tried disabling the two SubmitControls on this form.
br.find_control("ctl00$cmdLogin").disabled = True
Disabling cmdLogin generated HTTP Error 500.
br.find_control("ctl00$ContentPlaceHolder1$Button1").disabled = True
Disabling ContentPlaceHolder1$Button1 didn't make any difference. The submit went through, but the page it returned was still page 1 of the search results.
It's worth noting that this site does NOT use "Page$Next."
Can anyone help me figure out what I need to do to get ASPX form submit to work?
You need to post the values the page gives (EVENTVALIDATION, VIEWSTATE, etc.).
This code will work (note that it uses the awesome Requests library and not Mechanize)
When you get to the end of the results (resultpage 21) you have to pick up the VIEWSTATE and EVENTVALIDATION values again (and do that every 20 pages).
Note that there are a few values that you post that are empty, and a few that include values. The full list is like this:
Here is a discussion on the Scraperwiki mailing list on a similar problem: https://groups.google.com/forum/#!topic/scraperwiki/W0Xi7AxfZp0