python gzip file in memory and upload to s3

5k views Asked by At

I am using python 2.7...

I am trying to cat two log files , get data from specific dates using sed. Need to compress the files and upload them to s3 without making any temp files on the system,

sed_command = "sed -n '/{}/,/{}/p'".format(last_date, last_date)

Flow :

  1. cat two files .

Example : cat file1 file2

  1. Run sed manipulation in memory.
  2. compress the result in memory with zip or gzip.
  3. Upload the compressed file in memory to s3.

I have successfully done this with creation of temp files on the system and removing them when the upload to s3 is completed. I could not find a working solution to get this working on the fly without creation of any temp files.

2

There are 2 answers

1
systemjack On BEST ANSWER

Here's the gist of it:

conn = boto.s3.connection.S3Connection(aws_key, secret_key)
bucket = conn.get_bucket(bucket_name, validate=True)
buffer = cStringIO.StringIO()
writer = gzip.GzipFile(None, 'wb', 6, buffer)
writer.write(sys.stdin.read())
writer.close()
buffer.seek(0)
boto.s3.key.Key(bucket, key_path).set_contents_from_file(buffer)
buffer.close()
0
0e1val On

Kind of a late answer, but I recently published a package that does just that, it's installable via pypi:

    pip install aws-logging-handlers

And you can find usage documentation on git