Expat, pure C. How to ignore mismatched tags?

144 views Asked by At

I have a malformed XML (comes from a vendor, no realistic way to fix it). Working with an expat 2.2.9 (gcc 9)

I was hoping to do my own stack of tags with a hierarchy and do a forceful closer of less important tags once the more important tag is closed. For example, consider this html:

<p><b>text</p>

The <p> has a priority over <b>, and once I see </p> I want to also silently close <b>.

But working with the standard example outline.c ( https://github.com/libexpat/libexpat/blob/master/expat/examples/outline.c ) I see that expat does tag matching by itself.

$ ./outline < malformed.html
p
  b
Parse error at line 1:
mismatched tag
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
$

So my question is: how to tell expat, that I myself would do tag matching and XML_Parse() should not stop on such errors?

Or is there another C library which can handle such malformed XMLs?

1

There are 1 answers

3
Sebastian On

The XML specification requires XML parsers to be strict and not tolerate well-formedness errors. As a result, Expat does not offer a switch to ignore errors.