compare xml files using python

14.7k views Asked by At

I want to compare these two xml files:

File1.xml:

<ngs_sample id="40332">
  <workflow value="salmonella" version="101_provisional" />
  <results>
  <gastro_prelim_st reason="not novel" success="false">
      <type st="1364" />
      <type st="9999" />
  </gastro_prelim_st>
 </results>
</ngs_sample>

File2.xml:

<ngs_sample id="40332">
  <workflow value="salmonella" version="101_provisional" />
  <results>
  <gastro_prelim_st reason="not novel" success="false">
      <type st="1364" />
   </gastro_prelim_st>
 </results>
</ngs_sample>

I've used xmldiff to compare a.xml with b.xml:

def compare_xmls(observed,expected):

    from xmldiff import main, formatting
    formatter = formatting.DiffFormatter()
    diff = main.diff_files(observed,expected,formatter=formatter)
    return diff

out = compare_xmls(a.xml, b.xml)
print(out)

OUTPUT:

[delete, /ngs_sample/results/gastro_prelim_st/type[2]]

Anyone know how to identify what is the difference between the two xml files, i.e. what has been deleted compared to the file b.xml. Anyone recommend any other way of comparing xml files in python?

3

There are 3 answers

0
r.ook On BEST ANSWER

You can switch to the XMLFormatter and manually filter out the results:

...
# Change formatter:
formatter = formatting.XMLFormatter(normalize=formatting.WS_BOTH)

...

# after `out` has been retrieved:
import re
for i in out.splitlines():
  if re.search(r'\bdiff:\w+', i):
    print(i)

# Result:
#       <type st="9999" diff:delete=""/>
1
Victor 'Chris' Cabral On

Use the xmldiff to perform this exact task.

main.py

from xmldiff import main
diff = main.diff_files("file1.xml", "file2.xml")
print(diff)

output

[DeleteNode(node='/ngs_sample/results/gastro_prelim_st/type[2]')]
0
Karl On

Another option is use xml2 https://github.com/clone/xml2 (and something like bash process substitution)

$ diff --color <(xml2 < File1.xml) <(xml2 < File2.xml)

7,8d6
< /ngs_sample/results/gastro_prelim_st/type
< /ngs_sample/results/gastro_prelim_st/type/@st=9999