I am trying to verify if a online radio url is delivering music and if the url was redirected or not (this happens if for some reason the request url is wrong or not active). I found some advices here Fetching url in python with google app engine. However, for an url that delivers Content-Type:audio/mpeg it doesn't seem to work.
On my local machine using python 2.7.6 urllib2.urlopen everything is fine:
try:
print "begin urlopen"
url = urllib2.urlopen("http://streaming.radionomy.com/jamaican-roots-radio")
print "end urlopen"
except Exception, e:
print e
gives
begin urlopen
end urlopen
I can the read N bytes from the returned object (which is a socket._fileobject) and use the method geturl() to get the actual url from which the stream is coming (if there was no redirection the request url and the retrieved resource url are the same)
The problems arise using dev_appserver.py for google appengine (I didn't deployed yet). The call never returns:
begin urlopen
WARNING 2015-06-12 14:31:43,599 urlfetch_stub.py:504] Stripped prohibited headers from URLFetch request: ['Host']
and "end urlopen" is never printed.
I understand the warning error, so I switched (as suggested in the link above) to urlfetch:
try:
print "begin fetch"
url = urlfetch.fetch("http://streaming.radionomy.com/jamaican-roots-radio")
print "end fetch"
except Exception, e:
print e
gives
begin
The warnings is gone, but again the call doesn't return.
For a normal webpage url, everything is as expected. I guess that the problem is the response object that is never finished. Also using
urlfetch.set_default_fetch_deadline(5)
doesn't change the situation, probably because the data are continuously streamed from the server (and therefore no timeout is called??). I also tried the low level httplib.HTTPConnection, but after making the request the getresponse() function never returns.
To my purpose, the response header would be enough. But on the server (which is not under my control) the HEAD method is not implemented (despite being listed in Access-Control-Allow-Methods, as it can bee seen from a browser)
curl -X HEAD -i http://streaming.radionomy.com/jamaican-roots-radio
HTTP/1.0 501 Not Implemented
I didn't find any question on stackoverflow covering the case of a stream url except this one How to call Twitter's Streaming/Filter Feed with urllib2/httplib?. Unfortunately, the suggested response is not very helpful for me ("Using Twitter's 'standard' API").
Any idea I can solve this problem?
UPDATE
On google appengine (not on dev_appserver.py as above) the problems are similar:
- with a deadline of 5 sec
Deadline exceeded while waiting for HTTP response from URL...
- with a deadline of 60 sec
Traceback (most recent call last):
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/runtime/wsgi.py", line 267, in Handle result = handler(dict(self._environ), self._StartResponse)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in call rv = self.router.dispatch(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher return route.handler_adapter(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in call return handler.dispatch()
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch return method(*args, **kwargs)
File "/base/data/home/apps/s~radiosnoozers/3.384985169499124712/controllers/checkurl.py", line 80, in get print e
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/runtime/request_environment.py", line 94, in write self._request.errors.write(data)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/logservice/logservice.py", line 287, in write self._write(line)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/logservice/logservice.py", line 307, in _write if self._request != logsutil.RequestID():
DeadlineExceededError
The timeout is respected and there are no difference by using using allow_truncated=True. In any case, no access to the response...
I really don't know what is going on, but thanks for the given suggestions.
UrlFetch is meant for fetching a finite resource from a URL, and generally doesn't play nice with streams. It's waiting for the request to terminate. I believe that the endpoint doesn't play well with
Range
requests in general. Look at the headers when my browser hits that stream (great stream by the way):And now take a look at the response:
In fact, as I hinted above, I think the stream itself is not playing nice with HTTP. If you try to run an equivalent request via CURL and specify
Range: bytes=0-100
, you'll notice that the Range request header isn't respected, and it'll stream forever.So, it seems you'll need to use a Managed VM or Compute Engine instance to manually open and close the connection.