Hello i'm cleaning up my computer, so i found myself feeding a huge list of files to Handbrake for compressing them. After the compression, some files have a size which is bigger than the original. I want to clean up that, so i tried to make a small python script.
Basically i have 2 folders with files having same name but different size, i want to compare the files to delete the bigger one, so if i merge the folders i'll have saved only the smaller files in size.
I make an example of the folders i have
- test/Original
file1.mpg 40Mb
file2.mpg 2Mb
file3.mpg 400Mb
file4.mpg 45Mb
- test/Compressed
file1.mpg 20Mb
file2.mpg 2Mb
file3.mpg 200Mb
file4.mpg 105Mb
At the end of the script i'd like to have this (or a third folder with those merged)
- test/Original
file4.mpg 45Mb
- test/Compressed
file1.mpg 20Mb
file2.mpg 2Mb
file3.mpg 200Mb
file4.mpg 105Mb
i wrote this code and it seems to work, but i'd like to know if there's a better way of doing this, i heard of a function filecompare but i don't understand if i can get the filesize from it.
plus i dont understand why if i remove the comment to the line commented, i get an indent error.
import os
dirA = 'test/a'
dirB = 'test/b'
merged = []
with os.scandir(dirA) as it:
for entry in it:
if entry.is_file():
merged.append(entry)
with os.scandir(dirB) as it:
for entry in it:
if entry.is_file():
merged.append(entry)
for i in range(len(merged)):
# print('-------------iterating over %s' % (merged[i].name,merged[i].stat().st_size/1024**2))
for j in range(i + 1, len(merged)):
if str(merged[i].name) == str(merged[j].name):
print('----DUPLICATE %s %.2f Mb = %s %.2f Mb' % (merged[i].name, merged[i].stat().st_size/1024**2, merged[j].name, merged[j].stat().st_size/1024**2))
if merged[i].stat().st_size >= merged[j].stat().st_size:
print('removing %s %.2f Mb' % (merged[i].name, merged[i].stat().st_size/1024**2))
os.remove(merged[i])
elif merged[i].stat().st_size < merged[j].stat().st_size:
print('removing %s %.2f Mb' % (merged[j].name, merged[j].stat().st_size/1024**2))
os.remove(merged[j])
Deleting files based on size
This is a simple procedure and can be implemented in one funciton.
What's going on here?
compare_folders()
will take the paths to the folders being compared as inputs. It will then iterate through the contents of each folder and call the other functiondelete_larger_file()
which compares the sizes of 2 files and deletes the larger one.merge_folders()
is necessary to merge the folders in place. In other words, it will compare the contents of both folders and move the files that are not in one to the other. In the end, one folder should be empty and the other one should have all the smallest files.First call
compare_folders()
then callmerge_folders