How to find the end of an XML document in a stream with expat?

190 views Asked by At

I have a binary stream which contains concatenated XML documents. The stream is processed in chunks of arbitrary size using calls like this:

int expat_status = XML_Parse(parser->expat, buffer, buffer_size, 0);

How can I detect that a particular chunk of data contains the last byte of the currently parsed XML document and retrieve its position so that I can restart the parser from the next byte to parse the next XML document which follows in the stream?

1

There are 1 answers

0
Michał Trybus On BEST ANSWER

After trying for a while, the best solution I have found so far is to monitor XML_Parse for error code XML_ERROR_JUNK_AFTER_DOC_ELEMENT. In such case XML_GetCurrentByteIndex can be used to obtain the index of the first byte in stream which contains "junk".

From expat documentation (about XML_GetCurrentByteIndex and other functions from this group):

The position reported is that of the first of the sequence of characters that generated the current event (or the error that caused the parse functions to return 0.)

This index is relative to the beginning of the document stream, so the number of bytes consumed by subsequent calls to XML_Parse has to be accumulated to calculate the length of additional data in current chunk.

Then, the parser can be restarted and run from the calculated position inside the current chunk to start processing another XML document that follows in the stream.