Convert very complex JSON to csv using ADF

110 views Asked by At

I have a very complex JSON format with multiple array's. Need the best solution to convert it to csv file using ADF.

JSON format example:

   {
   "elementCollection":[
      {
         "elementId":4643,
         "alias":"f97836a8-7c2e-4aaf-a0c1-551ce93b6ee9",
         "attributes":[
            {
               "Schedule":{
                  "id":"602",
                  "type":"Heading"
               }
            },
            {
               "Release_Name_in_BusinessLine_WS":{
                  "textValue":"xxx",
                  "formattedTextValue":"123",
                  "id":"545",
                  "type":"Text",
                  "xhtmlTextValue":"<div> 21</div>"
               }
            },
            {
               "Type":{
                  "expression":{
                     "valid":"true",
                     "pendingUpdate":"false",
                     "formula":"'Type' + 'EP'"
                  },
                  "textValue":"",
                  "formattedTextValue":"xxx",
                  "id":"544",
                  "type":"Text",
                  "xhtmlTextValue":"<div></div>"
               }
            }
         ]
      },
      {
         "elementId":6443,
         "alias":"f73a8cdd-c4e1-4b21-8253-44c237ea60c2",
         "attributes":[
            {
               "Schedule":{
                  "id":"602",
                  "type":"Heading"
               }
            },
            {
               "Release_Name_in_BusinessLine_WS":{
                  "textValue":"xxx",
                  "formattedTextValue":"123",
                  "id":"545",
                  "type":"Text",
                  "xhtmlTextValue":"<div> 21</div>"
               }
            },
            {
               "Type":{
                  "expression":{
                     "valid":"true",
                     "pendingUpdate":"false",
                     "formula":"'Type' + 'EP'"
                  },
                  "textValue":"",
                  "formattedTextValue":"xxx",
                  "id":"544",
                  "type":"Text",
                  "xhtmlTextValue":"<div></div>"
               }
            }
         ]
      }
   ]
}

I did try using aggregate and flattern settings but didn't get desired result as expected. Is there any other way to achieve this, kindly suggest. I want the output something like below format enter image description here Thanks in advance!

Update: By using two flatten transformation one for ElementCollection and other for Attributes I am able to see all the objects.But now the problem is with duplicates. For each object inside the attributes two rows are generated (one with null value and other with actual value).

Below is the screenshot how it looks when I do group by only for ReleaseId enter image description here

I have around 100 attributes and the total rowcount is 19.The output I get is 19*100 = 1900 rows. so for particular column i need to pick only the actual value. for example if a column has 100 rows only one row will have actual value. Is there any way to aggregate and pick actual values from all the columns.

Thanks in advance ! Is there any way to handle this scenario.

Below is the screnshot of the output I get. enter image description here

1

There are 1 answers

4
Rakesh Govindula On

As your JSON consists multiple objects inside the same array, Dataflow is only identifying the first object columns while using the transformations.

To achieve your requirement, you need to manually create the required columns in the dataflow and this approach only works for the above case only.

First flatten the elementCollection like below and get the elementId and attributes array columns output.

enter image description here

It will give the output like this.

enter image description here

Now, use a Derived column transformation and here, create the required column with below expressions.

Schedule_id - split(split(toString(attributes),'{"id":"')[2],'"')[1]

Schedule_type - split(split(toString(attributes),'"type":"')[2],'"')[1]

Release_Name_in_BusinessLine_WS_textValue - split(split(toString(attributes),'"textValue":"')[2],'"')[1]

Release_Name_in_BusinessLine_WS_formattedTextValue - split(split(toString(attributes),'"formattedTextValue":"')[2],'"')[1]

Type_formattedTextValue - split(split(toString(attributes),'"formattedTextValue":"')[3],'"')[1]

enter image description here

It will give the output like below.

enter image description here

Remove the unwanted column attributes column using select transformation.

enter image description here

Desired result:

enter image description here

Add your sink dataset to this dataflow and generate the csv file.

NOTE: All the data types of the columns will be string type only because they got generated from split(). So, convert those to respective data types before the sink.