I am trying to get Google App Engine to gunzip my .gz blob file (single file compressed) automatically by setting the response headers as follows:
class download(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
resource = str(urllib.unquote(resource))
blob_info = blobstore.BlobInfo.get(resource)
self.response.headers['Content-Encoding'] = str('gzip')
# self.response.headers['Content-type'] = str('application/x-gzip')
self.response.headers['Content-type'] = str(blob_info.content_type)
self.response.headers['Content-Length'] = str(blob_info.size)
cd = 'attachment; filename=%s' % (blob_info.filename)
self.response.headers['Content-Disposition'] = str(cd)
self.response.headers['Cache-Control'] = str('must-revalidate, post-check=0, pre-check=0')
self.response.headers['Pragma'] = str(' public')
self.send_blob(blob_info)
When this runs, the file is downloaded without the .gz extension. However, the downloaded file is still gzipped. The file size of the downloaded data match the .gz file size on the server. Also, I can confirm this by manually gunzipping the downloaded file. I am trying to avoid the manual gunzip step.
I am trying to get the blob file to automatically gunzip during the download. What am I doing wrong?
By the way, the gzip file contains only a single file. On my self-hosted (non Google) server, I could accomplish the automatic gunzip by setting same response headers; albeit, my code there is written in PHP.
UPDATE:
I rewrote the handler to serve data from the bucket. However, this generates HTML 500 error. The file is partially downloaded before the failure. The rewrite is as follows:
class download(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
resource = str(urllib.unquote(resource))
blob_info = blobstore.BlobInfo.get(resource)
file = '/gs/mydatabucket/%s' % blob_info.filename
print file
self.response.headers['Content-Encoding'] = str('gzip')
self.response.headers['Content-Type'] = str('application/x-gzip')
# self.response.headers['Content-Length'] = str(blob_info.size)
cd = 'filename=%s' % (file)
self.response.headers['Content-Disposition'] = str(cd)
self.response.headers['Cache-Control'] = str('must-revalidate, post-check=0, pre-check=0')
self.response.headers['Pragma'] = str(' public')
self.send_blob(file)
This downloads 540,672 bytes of the 6,094,848 bytes file to the client before the server terminate and issued a 500 error. When I issue 'file' on the partially downloaded file from the command line, Mac OS seems to correctly identify the file format as 'SQLite 3.x database' file. Any idea of why the 500 error on the server? How can I fix the problem?
You should first check to see if your requesting client supports gzipped content. If it does support gzip content encoding, then you may pass the gzipped blob as is with the proper
content-encoding
andcontent-type
headers, otherwise you need to decompress the blob for the client. You should also verify that your blob'scontent_type
isn'tgzip
(this depends on how you created your blob to begin with!)You may also want to look at Google Cloud Storage as this automatically handles gzip transportation so long as you properly compress the data before storing it with the proper content-encoding and content-type metadata.
See this SO question: Google cloud storage console Content-Encoding to gzip
Or the GCS Docs: https://cloud.google.com/storage/docs/gsutil/addlhelp/WorkingWithObjectMetadata#content-encoding
You may use GCS as easily (if not more easily) as you use the blobstore in AppEngine and it seems to be the preferred storage layer to use going forward. I say this because the File API has been deprecated which made blobstore interaction easier and great efforts and advancements have been made to the GCS libraries making the API similar to the base python file interaction API
UPDATE:
Since the objects are stored in GCS, you can use 302 redirects to point users to files rather than relying on the Blobstore API. This eliminates any unknown behavior of the Blobstore API and GAE delivering your stored objects with the content-type and content-encoding you intended to use. For objects with a public-read ACL, you may simply direct them to either
storage.googleapis.com/<bucket>/<object>
or<bucket>.storage.googleapis.com/<object>
. Alternatively, if you'd like to have application logic dictate access, you should keep the ACL to the objects private and can use GCS Signed URLs to create short lived URLs to use when doing a 302 redirect.Its worth noting that if you want users to be able to upload objects via GAE, you'd still use the Blobstore API to handle storing the file in GCS, but you'd have to modify the object after it was uploaded to ensure proper gzip compressing and content-encoding meta data is used.