I'm coding against a spec in which encoded characters--slashes in particular--are significant. However, try as I might, I can't find a way to access the URI before the encoded characters are decoded. I'm using Werkzeug, but to pare it back to a simple example, if I run:
from wsgiref.util import request_uri
from wsgiref.simple_server import make_server
def app(environ, start_response):
status = '200 OK'
headers = [('Content-type', 'text/plain')]
start_response(status, headers)
return request_uri(environ)+'\n'
make_server('', 5000, app).serve_forever()
and then:
me@here:~ $ curl "http://localhost:5000/abc%2F123/foo"
http://localhost:5000/abc/123/foo
as you see, the %2F
is already decoded to a /
. I've looked deeper into the environment, but every way I've found to access the URI or parts thereof behaves this way. Is there something I'm missing?
For some WSGI servers it is available in the
REQUEST_URI
value passed in the WSGI environ dictionary. Using it is however totally unreliable as it is in the original raw form as sent by the browser and hasn't had any normalisations done on it at all to clean it up. You would have to replicate all normalisations that a web server would normally do, which could be tricky, plus relying on it would leave your code non portable since only some WSGI servers provide it.The general situation is that under WSGI there isn't really a good way to do what you want. If you want to know more I suggest you dig through the Python WEB-SIG mailing list where there has been discussions about this in the past.