How to use Python's difflib to produce side-by-side comparison of two files similar to Unix sdiff command?

8.3k views Asked by At

I am using Python 2.6 and I want to create a simple GUI with two side-by-side text panes comparing two text files (file1.txt & file2.txt) .

I am using difflib but it is not clear for me how to produce a result similar to the sdiff Unix command.

In order to reproduce a side-by-side comparison, I need difflib to return two variables file1_diff and file2_diff, for instance.

I have also considered to use sdiff output directly and parse it to separate the panes but it turned out not to be as easy as it seems... Any hints?

4

There are 4 answers

4
Hett On

I've tried to do files diff with difflib.context_diff:

diff = difflib.context_diff(fromlines, tolines, fromfile='file1.txt', tofile='file2.txt')
sys.stdout.writelines(diff)

In this case your output will be something like this:

*** file1.txt
--- file2.txt
***************
*** 1,6 ****
! aasdf
  qwer
  123
! poiu
! xzcv34
  xzcv
--- 1,6 ----
! asdf
  qwer
+ mnbv
  123
! cvnn
  xzcv

In this case you'll be able easily to separate each file diff, but I'm not sure if you will be satisfied by the output of context_diff. You haven't mentioned in what way you're using the difflib.

1
Jérémie On

Building on @Bryan Oakley's answer, I wrote a quick Gist:

https://gist.github.com/jlumbroso/3ef433b4402b4f157728920a66cc15ed

with a side-by-side diff method (including the method to produce this side-by-side arrangement using the textwrap library) that you can call on two lists of lines:

print(better_diff(
    ["a", "c",      "a", "a", "a", "a",      "a", "a", "e"],
    ["a", "c", "b", "a", "a", "a", "a", "d", "a", "a"],
    width=20,
    as_string=True,
    left_title="  LEFT",
))

will produce:

  LEFT   | 
-------- | --------
a        | a
c        | c
         | b
a        | a
a        | a
a        | a
a        | a
         | d
a        | a
a        | a
e        | 
0
Bryan Oakley On

You can use difflib.Differ to return a single sequence of lines with a marker at the start of each line which describes the line. The markers tell you the following information about the line:

Marker Description
'- ' line unique to file 1
'+ ' line unique to file 2
' ' line common to both files
'? ' line not present in either input files

You can use this information to decide how to display the data. For example, if the marker is , you put the line both in the left and right widgets. If it's + , you could put a blank line on the left and the actual line on the right showing that the line is unique to the text on the right. Likewise, - means the line is unique to the left.

For example, you can create two text widgets t1 and t2, one for the left and one for the right. You can compare two files by creating a list of lines for each and then passing them to the compare method of the differ and then iterating over the results.

t1 = tk.Text(...)
t2 = tk.Text(...)

f1 = open("file1.txt", "r").readlines()
f2 = open("file2.txt", "r").readlines()

differ = difflib.Differ()
for line in differ.compare(f1, f2):
    marker = line[0]
    if marker == " ":
        # line is same in both
        t1.insert("end", line[2:])
        t2.insert("end", line[2:])

    elif marker == "-":
        # line is only on the left
        t1.insert("end", line[2:])
        t2.insert("end", "\n")

    elif marker == "+":
        # line is only on the right
        t1.insert("end", "\n")
        t2.insert("end", line[2:])

The above code ignores lines with the marker ? since those are extra lines that attempt to bring attention to the different characters on the previous line and aren't actually part of either file. You could use that information to highlight the individual characters if you wish.

0
pschanely On

How about something like this?

>>> a = ['cat', 'dog', 'horse']
>>> b = ['cat', 'horse', 'chicken']
>>> comparison = list(l for l in difflib.Differ().compare(a,b) if not l.startswith('?'))
>>> left = [l[2:] if l.startswith((' ', '-')) else '' for l in comparison]
>>> right = [l[2:] if l.startswith((' ', '+')) else '' for l in comparison]
>>> left
['cat', 'dog', 'horse', '']
>>> right
['cat', '', 'horse', 'chicken']