Datafusion load BQ with XML 2003 worksheet data

290 views Asked by At

I have a system exporting data as XML 2003 Worksheet. I need to load it to Bigquery through datafusion or any other process using GCP resources. So

  • Is it possible to complete this with DataFusion
  • I have followed the process for XML transformation in https://www.youtube.com/watch?v=e-5K4cxwGrc&feature=youtu.be. So far I have reached a point where the header and data rows appear in different rows but same column. I am not able to parse it any further(using Wrangler) to individual columns as it just keeps isolating the json key:value pairs in different rows but same column

As I am new to datafusion, appreciate some detailed guidance.

1

There are 1 answers

1
Malaman On

This can be implemented using Data Fusion.

Basically, once you have the file (either uploaded directly or connecting using a source) and use the transformation XML to JSON, you can add a parsing operation for the JSON so it will be parsed into columns [1]. This will add another Transformation in the wrangler.

Additionally, I would suggest that you take a look at the documentation for Data Fusion in GCP which is very self-explanatory [2].

[1]- Column transformations -> Parse -> JSON

[2]- https://cloud.google.com/data-fusion/docs