I am writing a function for some existing python code that will be passed a Mechanize browser object as a parameter.
I fill in some details in a form in the browser, and use response = browser.submit()
to move the browser to a new page, and collect some information from it.
Unfortunately, I occasionally get the following error:
httperror_seek_wrapper: HTTP Error 500: Internal Server Error
I've navigated to the page in my own browser, and sure enough, I occasionally see this error directly, so I think this is a server problem, not anything to do with robots.txt
, headers or similar.
The problem is that after submitting, the state of the browser
object changes and I can't continue to use it. My first thought was to try taking a deep copy first and use that if I ran into problems, but that gives the error TypeError: object.__new__(cStringIO.StringO) is not safe, use cStringIO.StringO.__new__()
as described here.
I've also tried using browser.back()
but get NoneType
errors.
Does anyone have a good solution to this?
Solution (with thanks to karnesJ.R below):
A great solution below uses the excellent requests
library (docs here). requests
has functionality to fill in a form and submit via post
or get
, which importantly doesn't change the state of the br
object.
An excellent website allows us to test various error codes, and has a form interface at the top that I've tested this on. I create a br
object at this site, then define a function that selects the form from br
, pulls out the relevant information, but does the submit via requests
- so that the br
object hasn't changed and is re-usable. Error codes cause requests
to return rubbish, but don't render the br
unusable.
As stated below, this involves a little more setup time, but is well worth it.
import mechanize
import requests
def testErrorCodes(br,theCodes):
for x in theCodes:
br.select_form(nr=0)
theAction = br.action
payload = {'code': x}
response = requests.post(theAction, data=payload)
print response.status_code
br=mechanize.Browser()
br.set_handle_robots(False)
response = br.open("http://savanttools.com/test-http-status-codes")
testErrorCodes(br,[401,402,403,404,500,503,504]) # Prints the error codes
testErrorCodes(br,[404]) # The browser is still alive and well to be used again!
It's been a while since I've written for python, but I think I have a workaround for your problem. Try this method:
You can find documentation about the
requests
library here. I personally think thatrequests
is better for your case thanmechanize
... but it does require a little more overhead from you in that you need to break down the submission to raw POST using some kind of RESTful interceptor in your browser.Ultimately though, by passing in
br
you are restricting yourself to the way that mechanize handles browser states onbr.submit()
.