How to parse EDIFACT file data using apache spark?

Question

How to parse EDIFACT file data using apache spark?

1.2k views Asked by VVGSRK At 12 November 2018 at 13:29

Can someone suggest me how to parse EDIFACT format data using Apache spark ?

i have a requirement as every day EDIFACT data will be written to aws s3 bucket. i am trying to find a best way to convert this data to structured format using Apache spark.

Original Q&A

There are 1 answers

**Emiliano Martinez** · Answer 1 · 2018-11-12T14:28:12+00:00

In case you have your invoices in EDIFACT format you can read each one of them as one String per Invoice using RDD´s. Then you will have a RDD[String] which represents the distributed invoice collection. Take a look to https://github.com/CenPC434/java-tools with this you can convert the EDIFACT strings to XML. This repo https://github.com/databricks/spark-xml shows how to use XML format as input source to create Dataframes and perform multiples queries, aggregation... Etc.

TechQA.

How to parse EDIFACT file data using apache spark?

There are 1 answers

Related Questions in SCALA

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in EDI

Related Questions in EDIFACT

Popular Questions

Trending Questions