Are there existing solutions to delete any files older than x days?
How can I delete or purge old files on S3?
72.3k views Asked by Erik AtThere are 6 answers
Here is how to implement it using a CloudFormation template:
JenkinsArtifactsBucket:
Type: "AWS::S3::Bucket"
Properties:
BucketName: !Sub "jenkins-artifacts"
LifecycleConfiguration:
Rules:
- Id: "remove-old-artifacts"
ExpirationInDays: 3
NoncurrentVersionExpirationInDays: 3
Status: Enabled
This creates a lifecycle rule as explained by Ravi Bhatt.
Read more on that: AWS::S3::Bucket Rule
How object lifecycle management works: Managing your storage lifecycle
You can use the following PowerShell script to delete object expired after x days
.
[CmdletBinding()]
Param(
[Parameter(Mandatory=$True)]
[string]$BUCKET_NAME, # Name of the Bucket
[Parameter(Mandatory=$True)]
[string]$OBJ_PATH, # Key prefix of s3 object (directory path)
[Parameter(Mandatory=$True)]
[string]$EXPIRY_DAYS # Number of days to expire
)
$CURRENT_DATE = Get-Date
$OBJECTS = Get-S3Object $BUCKET_NAME -KeyPrefix $OBJ_PATH
Foreach($OBJ in $OBJECTS){
IF($OBJ.key -ne $OBJ_PATH){
IF(($CURRENT_DATE - $OBJ.LastModified).Days -le $EXPIRY_DAYS){
Write-Host "Deleting Object= " $OBJ.key
Remove-S3Object -BucketName $BUCKET_NAME -Key $OBJ.Key -Force
}
}
}
Here is a Python script to delete N-days old files:
from boto3 import client, Session
from botocore.exceptions import ClientError
from datetime import datetime, timezone
import argparse
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--access_key_id', required=True)
parser.add_argument('--secret_access_key', required=True)
parser.add_argument('--delete_after_retention_days', required=False, default=15)
parser.add_argument('--bucket', required=True)
parser.add_argument('--prefix', required=False, default="")
parser.add_argument('--endpoint', required=True)
args = parser.parse_args()
access_key_id = args.access_key_id
secret_access_key = args.secret_access_key
delete_after_retention_days = int(args.delete_after_retention_days)
bucket = args.bucket
prefix = args.prefix
endpoint = args.endpoint
# Get current date
today = datetime.now(timezone.utc)
try:
# create a connection to Wasabi
s3_client = client(
's3',
endpoint_url=endpoint,
access_key_id=access_key_id,
secret_access_key=secret_access_key)
except Exception as e:
raise e
try:
# List all the buckets under the account
list_buckets = s3_client.list_buckets()
except ClientError:
# Invalid access keys
raise Exception("Invalid Access or Secret key")
# Create a paginator for all objects.
object_response_paginator = s3_client.get_paginator('list_object_versions')
if len(prefix) > 0:
operation_parameters = {'Bucket': bucket,
'Prefix': prefix}
else:
operation_parameters = {'Bucket': bucket}
# Instantiate temp variables.
delete_list = []
count_current = 0
count_non_current = 0
print("$ Paginating bucket " + bucket)
for object_response_itr in object_response_paginator.paginate(**operation_parameters):
for version in object_response_itr['Versions']:
if version["IsLatest"] is True:
count_current += 1
elif version["IsLatest"] is False:
count_non_current += 1
if (today - version['LastModified']).days > delete_after_retention_days:
delete_list.append({'Key': version['Key'], 'VersionId': version['VersionId']})
# Print objects count
print("-" * 20)
print("$ Before deleting objects")
print("$ current objects: " + str(count_current))
print("$ non-current objects: " + str(count_non_current))
print("-" * 20)
# Delete objects 1000 at a time
print("$ Deleting objects from bucket " + bucket)
for i in range(0, len(delete_list), 1000):
response = s3_client.delete_objects(
Bucket=bucket,
Delete={
'Objects': delete_list[i:i + 1000],
'Quiet': True
}
)
print(response)
# Reset counts
count_current = 0
count_non_current = 0
# Paginate and recount
print("$ Paginating bucket " + bucket)
for object_response_itr in object_response_paginator.paginate(Bucket=bucket):
if 'Versions' in object_response_itr:
for version in object_response_itr['Versions']:
if version["IsLatest"] is True:
count_current += 1
elif version["IsLatest"] is False:
count_non_current += 1
# Print objects count
print("-" * 20)
print("$ After deleting objects")
print("$ current objects: " + str(count_current))
print("$ non-current objects: " + str(count_non_current))
print("-" * 20)
print("$ task complete")
And here is how I run it:
python s3_cleanup.py --aws_access_key_id="access-key" --aws_secret_access_key="secret-key-here" --endpoint="https://s3.us-west-1.wasabisys.com" --bucket="ondemand-downloads" --prefix="" --delete_after_retention_days=5
If you want to delete files only from a specific folder, then use the prefix
parameter.
You can use AWS S3 Life cycle rules to expire the files and delete them. All you have to do is select the bucket, click on "Add lifecycle rules" button and configure it and AWS will take care of them for you.
You can refer the below blog post from Joe for step-by-step instructions. It's quite simple actually:
Amazon has introduced object expiration recently.