Http get ungzip response in bash

855 views Asked by At

I need to manually ungzip response of the following page: http://muaban.net/ho-chi-minh.html

I'm doing

echo -e "GET /ho-chi-minh.html HTTP/1.1\r\nHost: muaban.net\r\nAccept-Encoding: gzip\r\n" | nc muaban.net 80 > response.txt

until response actually contains Content-Encoding: gzip or Content-Encoding: deflate header (sometimes it's just plain text), then

cat response.txt | sed '1,14d' | zcat

but it says input is not in gzip format.

Here are the headers:

HTTP/1.1 200 OK
Cache-Control: public, max-age=67
Content-Type: text/html
Content-Encoding: deflate
Expires: Wed, 16 May 2012 15:20:31 GMT
Last-Modified: Wed, 16 May 2012 15:18:31 GMT
Vary: *
Server: Microsoft-IIS/7.5
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
X-Proxy: 162
Date: Wed, 16 May 2012 15:19:23 GMT
Content-Length: 12618
3

There are 3 answers

3
Kevin On

There is an answer on another question that indicates IIS uses the wrong deflation format. But it seems the site in question randomly returns either deflate or (the correct) gzip, which is why David Souther was able to zcat it (I got gzip once out of several tries). So you'll probably want to loop and fetch it until you get a gzipped version (probably should include a delay and/or max tries).

0
Mark Adler On

See the answer here about the confusion over the meaning of "deflate" as an HTTP content encoding.

It is best to simply not accept deflate and only accept gzip. Then the server won't deliver deflate.

If you accept deflate, then you must be prepared to try decoding it both as a zlib stream (which is what the HTTP standard specifies) or as a raw deflate stream (which is what Microsoft servers apparently would deliver in error). Then use the one that decoded properly.

Neither the zlib nor raw deflate formats are gzip, and so zcat would not work on either.

0
pizza On

you can just set the encoding to "identity", that site returns plain text to you.