I have a binary stream which contains concatenated XML documents. The stream is processed in chunks of arbitrary size using calls like this:
int expat_status = XML_Parse(parser->expat, buffer, buffer_size, 0);
How can I detect that a particular chunk of data contains the last byte of the currently parsed XML document and retrieve its position so that I can restart the parser from the next byte to parse the next XML document which follows in the stream?
After trying for a while, the best solution I have found so far is to monitor
XML_Parse
for error codeXML_ERROR_JUNK_AFTER_DOC_ELEMENT
. In such caseXML_GetCurrentByteIndex
can be used to obtain the index of the first byte in stream which contains "junk".From expat documentation (about
XML_GetCurrentByteIndex
and other functions from this group):This index is relative to the beginning of the document stream, so the number of bytes consumed by subsequent calls to
XML_Parse
has to be accumulated to calculate the length of additional data in current chunk.Then, the parser can be restarted and run from the calculated position inside the current chunk to start processing another XML document that follows in the stream.