How to compare two xml files in python in script?

13.6k views Asked by At

I am new python. I have some predefined xml files. I have a script which generate new xml files. I want to write an automated script which compares xmls files and stores the name of differing xml file names in output file? Thanks in advance

4

There are 4 answers

0
AudioBubble On

I think you're looking for the filecmp module. You can use it like this:

import filecmp
cmp = filecmp.cmp('f1.xml', 'f2.xml')

# Files are equal
if cmp:
    continue
else:
    out_file.write('f1.xml') 

Replace f1.xml and f2.xml with your xml files.

0
Brionius On

Building on @Xaranke's answer:

import filecmp

out_file = open("diff_xml_names.txt")
# Not sure what format your filenames will come in, but here's one possibility.
filePairs = [('f1a.xml', 'f1b.xml'), ('f2a.xml', 'f2b.xml'), ('f3a.xml', 'f3b.xml')]

for f1, f2 in filePairs:
    if not filecmp.cmp(f1, f2):
        # Files are not equal
        out_file.write(f1+'\n')

out_file.close()
0
Pankaj Raheja On

What about the following snippet :

def separator(self):
    return "!@#$%^&*" # Very ugly separator

def _traverseXML(self, xmlElem, tags, xpaths):
    tags.append(xmlElem.tag)
    for e in xmlElem:
        self._traverseXML(e, tags, xpaths)

    text = ''
    if (xmlElem.text):
        text = xmlElem.text.strip()

    xpaths.add("/".join(tags) + self.separator() + text)
    tags.pop()

def _xmlToSet(self, xml):
    xpaths = set() # output
    tags = list()
    root = ET.fromstring(xml)
    self._traverseXML(root, tags, xpaths)

    return xpaths

def _areXMLsAlike(self, xml1, xml2):
    xpaths1 = self._xmlToSet(xml1)
    xpaths2 = self._xmlToSet(xml2)

    return xpaths1 == xpaths2
0
janbrohl On

Do you speak of comparing them byte-wise or for semantic equality? (Is <tag attr1="1" attr2="2" /> equal to <tag attr2="2" attr1="1" />?) If you want to check for semantic equality have a look at Xml comparison in Python

When generating xml especially if using normal dicts for the attributes somewhere attribute order can be mixed up even sometimes when you use the same script with the same input.

items()

...

CPython implementation detail: Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions.