XML Parse operator throws error when working with large XML file in IBM Streams

235 views Asked by At

XML Parse operator throws this error while working with large XML files: The following error occurred during XML parsing: internal error: Huge input lookup

While documentation says this has been fixed in Streams 4.2.1.3 where we can add this parameter to XML Parse operator to fix it: xmlParseHuge: true;

The above parameter is not supported in lower versions of Streams. How do I fix this in Streams 4.2.1.1?

2

There are 2 answers

0
Ankit Sahay On BEST ANSWER

There was not better way to do this is in Streams 4.2.1.1 I finally decided to use topology toolkit to make a Python operator. XML tuples were passed through this operator and xml.etree.ElementTree library was used to parse the XML, extract required data and return back the tuple type.

3
ndsilva On

If the XML data is coming from a FileSource, try the workaround of using a smaller block size for parsing the file: Change it to 10000u*1024u to complete large XML parsing successfully.

stream<blob dataBlob,rstring fName> FileLoadedFromFS = FileSource(DirFileScanned){

param format : block;
      blockSize : 10000u*1024u;
      compression : gzip;
      parsing : fast;
      output FileLoadedFromFS:
      fName = FileName();
    }

From: http://www-01.ibm.com/support/docview.wss?uid=swg1IT22914