I have tried to follow the provided documentation to read XML files from a GCS location :
https://beam.apache.org/documentation/sdks/javadoc/2.0.0/org/apache/beam/sdk/io/xml/XmlIO.html
It seems there is some problem in configuration and I'm missing some required pieces to make my code run. I have kept the XML file at GCS location and used the code given below to read the XML file.
public class XMLReaderWriter {
private static final Logger LOG = LoggerFactory.getLogger(XMLReaderWriter.class);
public static void main(String args[])
{
DataflowPipelineOptions options=PipelineOptionsFactory.as(DataflowPipelineOptions.class);
options.setTempLocation("gs://xyz_test/staging");
options.setProject("test-1-160106");
Pipeline p=Pipeline.create(options);
PCollection<Record> result= p.apply(XmlIO.<Record>read()
.from("gs://xyz_test/sample.xml")
.withRootElement("catalog")
.withRecordElement("title")
.withRecordClass(Record.class));
result.apply(ParDo.of(new DoFn<Record,String>(){
@ProcessElement
public void processelement(ProcessContext c)
{
System.out.println(c.element().toString());
}
}));
p.run();
}
Code is failing with exceptions and below is a part of stack-trace for the same :
Exception in thread "main" java.lang.RuntimeException: java.lang.NoSuchMethodError: org.apache.avro.generic.GenericData$Record.<init>()
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:207)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:63)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:295)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:281)
Has anyone done it before? Please let me know the code changes I need to put in place.