I have a system exporting data as XML 2003 Worksheet. I need to load it to Bigquery through datafusion or any other process using GCP resources. So
- Is it possible to complete this with DataFusion
- I have followed the process for XML transformation in https://www.youtube.com/watch?v=e-5K4cxwGrc&feature=youtu.be. So far I have reached a point where the header and data rows appear in different rows but same column. I am not able to parse it any further(using Wrangler) to individual columns as it just keeps isolating the json key:value pairs in different rows but same column
As I am new to datafusion, appreciate some detailed guidance.
This can be implemented using Data Fusion.
Basically, once you have the file (either uploaded directly or connecting using a source) and use the transformation XML to JSON, you can add a parsing operation for the JSON so it will be parsed into columns [1]. This will add another Transformation in the wrangler.
Additionally, I would suggest that you take a look at the documentation for Data Fusion in GCP which is very self-explanatory [2].
[1]- Column transformations -> Parse -> JSON
[2]- https://cloud.google.com/data-fusion/docs