I am trying to get all the files that are a specified size within a folder of an s3 bucket. How do I go about iterating through the bucket and filtering the files by the specified size? I also want to return the file names of those with the correct size.

s3 = boto3.client('s3')
s3.list_objects_v2(Bucket = 'my-images')

A sample output is

 u'Key': u'detail/01018535.jpg',
   u'LastModified': datetime.datetime(2019, 1, 23, 0, 48, 41, tzinfo=tzlocal()),
   u'Size': 13535,
   u'StorageClass': 'STANDARD'},
  {u'ETag': '"cd65991a1c6f118e8b036208a30028a7"',
   u'Key': u'detail/0119AF2.jpg',
   u'LastModified': datetime.datetime(2019, 1, 10, 17, 17, tzinfo=tzlocal()),
   u'Size': 12984,
   u'StorageClass': 'STANDARD'}

for instance lets say that I would want a search for a size of 12984. Then it would return the 'Key'

2 Answers

0
rajesh On Best Solutions

If you are looking to use boto3, I use this function to find zero byte objects. You can tweak it to your needs by filtering on specific size

import boto3

def get_empty_objects(bucket_name, prefixes):
    """
    get list of objects from a given s3 prefix recursively
    """
    results = []
    for prefix in prefixes:
        s3client = boto3.client('s3')
        paginator = s3client.get_paginator("list_objects_v2")
        paginator_result = paginator.paginate(
            Bucket=bucket_name, Prefix=prefix)
        try:
            for object in paginator_result.search('Contents'):
                if object['Size'] == 0:
                    results.append("s3://" + bucket_name + "/" + object['Key'])
        except Exception as err:
            print(">>> Error processing objects of [s3://" + bucket_name +
                  "/" + prefix + "] - " + str(err))
        print(">>> Returning " + str(len(results)) + " objects for [s3://" + bucket_name + "/" + prefix + "]")
    return results

Usage:

get_empty_objects("mybucket", ["prefix1/", "prefix2/"])
0
John Rotenstein On

You can use a --query expression:

aws s3api list-objects-v2 --bucket my-images --query 'Contents[?Size==`12984`].[Key]' --output text

I put [Key] in square brackets to force each one to appear on a separate line.

This syntax works on a Mac command-line. Windows might need different quote marks.

For tips about using such expressions, see: JMESPath Tutorial