I am trying to parse the streaming data using Spark Streaming. I am receiving input data from kafka and then am converting like JavaPairInputDStream -> rdd. My RDD is like:
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
I am getting data line by line. Next O am trying to parse by using Stax parser. Here is my code:
XMLStreamReader reader;
XMLInputFactory factory = XMLInputFactory.newInstance();
InputStream in = IOUtils.toInputStream(items._2, "UTF-8");
reader = factory.createXMLStreamReader(in);
When I tried like this, I am getting
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,7]
Message: XML document structures must start and end within the same entity.
When I tried like this
reader = factory.createXMLStreamReader(new FileReader(items._2));
I am getting
16/12/27 15:17:26 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 (TID 1)
java.io.FileNotFoundException: <note> (No such file or directory)