Finding the diff of two lists of strings

4.2k views Asked by At

I'm trying to find a diff (longest common subsequences) between two lists of strings. I'm guessing difflib could be useful here, but difflib.ndiff annotates the output with -, +, etc. For instance

from difflib import ndiff
t1 = 'one 1\ntwo 2\nthree 3'.splitlines()
t2 = 'one 1\ntwo 29\nthree 3'.splitlines()
d = list(ndiff(t1, t2    )); print d;

['  one 1', '- two 2', '+ two 29', '?      +\n', '  three 3']

Is tokenising and removing the letter-codes in the output the right way? Is this the proper Pythonic way of diffing lists?

3

There are 3 answers

3
Anand S Kumar On

If all you want is the difference of first list from second, you can convert them to set and take set difference using - operator.

Example -

>>> l1 = [1,2,3,4,5]
>>> l2 = [4,5,6,7,8]
>>> print(list(set(l1) - set(l2)))
[1, 2, 3]
7
Vivek Sable On

By List comprehension:

In [16]: l1 = ['a', 'b', 'c', 'd']

In [17]: l2 = ['a', 'x', 'y', 'c']

In [18]: l1_l2 = [ii for ii in l1 if ii not in l2]

In [19]: l1_l2
Out[19]: ['b', 'd']

In [20]: l2_l1 = [ii for ii in l2 if ii not in l1]

In [21]: l2_l1 
Out[21]: ['x', 'y']

In [22]: 
2
saikiran On
l1 = ["a", "b", "c"]

l2 = ["a", "b", "d"]

If we want to get item which is not there in l2 ("d") try below code

l3 = [i for i in l1 if i not in l2]

# l3 now == ["c"]