globally handle unicode decode errors from webob

346 views Asked by At

I have a large web system written on top of WSGI that uses webob to access form data (no framework is involved). Randomly we'll get unhandled exceptions of UnicodeDecodeError from browsers (or bots) sending in undecodable escape sequences in the query string or POST data. I'm looking for a good default behavior that doesn't involve me getting an unhandled exception email.

My first idea would be to write a site-wide middleware that accesses the params of a webob request object with an exception handler that returns back a 400 (or maybe strips out the un-decodable data).

How do other systems/frameworks handle this?

1

There are 1 answers

2
Jeremy On

After some digging, I discovered that the .decode() method should be used on the request to create a decoded request at the very beginning. If this fails with a UnicodeDecodeError, I send back a 400. For example:

    try:
        req = webob.Request(environ).decode('ascii')
    except UnicodeDecodeError, e:
        return webob.Response(status=400, body="""
            <h1>Bad Request</h1>
            <p>We apologize. Your request includes characters the server
            cannot understand. Please click the back button and
            check your request for non-standard characters like accent
            marks and copy-paste data from word processing
            programs.</p>""")(environ, start_response)