Python program to compare two files for showing the difference

1.1k views Asked by At

I have the following code to compare two files. I would like this program run if I point them to files which are as big as 4 or 5 MB. When I do that, the prompt cursor in python console just blinks, and no output is shown. Once, I ran it for the whole night and the next morning it was still blinking. What can I change in this code?

import difflib

file1 = open('/home/michel/Documents/first.csv', 'r')
file2 = open('/home/michel/Documents/second.csv', 'r')

diff = difflib.ndiff(file1.readlines(), file2.readlines())
delta = ''.join(diff)
print delta
2

There are 2 answers

0
umut On

If you use linux based system, you can call external command diff and you can use result of it. I try it for two file 14M and 9.3M with diff command. It takes 1.3 second.

real    0m1.295s
user    0m0.056s
sys     0m0.192s
0
Nima Soroush On

When I have tried to use difflib in your way I had the same issue, because for big files difflib buffer the whole file in the memory and then compare them. As the solution you can compare two file partially. Here I am doing it for each 100 line.

import difflib

file1 = open('1.csv', 'r')
file2 = open('2.csv', 'r')

lines_file1 = []
lines_file2 = []

# i: number of line
# line: content of line
for i, line in enumerate(zip(file1, file2)):
    # check if it is in line 100
    if not (i % 100 == 0):
        lines_file1.append(line[0])
        lines_file2.append(line[1])
    else:
        # show the different for 100 line
        diff = difflib.ndiff("".join(lines_file1), "".join(lines_file2))
        print ''.join(list(diff))
        lines_file1 = []
        lines_file2 = []

# show the different if any lines left
diff = difflib.ndiff("".join(lines_file1), "".join(lines_file2))
print ''.join(list(diff))
file1.close()
file2.close()

Hope it helps.