Is this file an XML or HTML file? How can I parse it?

106 views Asked by At

It's 100MB, so here's a portion of it: https://drive.google.com/file/d/0B1GVNHhYNzBINWl4TVFOejhtbEE/view?usp=sharing

It doesn't come with an extension, I added the xml extension to it.

What file type is this and how can I parse it? I tried untangle with python and ran into errors.

1

There are 1 answers

0
kjhughes On BEST ANSWER

The file you reference is an XML export of a MediaWiki.

See also the MediaWiki page form XSD.

You can parse it with a standard XML parser, which is available in most languages, including Python.