ADF dataflow and columns/rows in separate array in JSON

1.4k views Asked by At

I have a bunch of json files which have an array with column names and a separate array for the rows. I want a dynamic way of retrieving column names and merge them with the rows for each json file. Been playing around with derived columns and column patterns, but struggling to get it working.

I want the column names from [data.column.shortText] and values for each corresponding [data.rows.value] according to the order.

Example format

{
   "messages":{
      
   },
   "data":{
      "columns":[
         {
            "columnName":"SelectionCriteria1",
            "shortText":"Case no."
         },
         {
            "columnName":"SelectionCriteria2",
            "shortText":"Period for periodical values",
         },
         {
            "columnName":"SelectionCriteria3",
            "shortText":"Location"
         },
         {
            "columnName":"SelectionCriteriaAggregate",
            "shortText":"Value"
         }
      ],
      "rows":[
         [
            {
               "value":"23523"
            },
            {
               "value":12342349
            },
            {
               "value":"234234",
               "code":3342
            },
            {
               "value":234234234
            }
         ]
      ]
   }
}
1

There are 1 answers

0
Sally Dabbah On

First, you need to fix your Json data, i can see you have an extra comma in columns second Json and in rows you have value as int and as string so when i tried to parse it in ADF i got an error.

i don't quite understand why you're trying to do merge by position because normally we get rows more than columns, and if you'll get 5 rows and 3 columns you will get an error.

Here is my approach to your problem: the main idea is that i added index column to both arrays and joined the jsons by Inner Join.

  1. created a Source Data (its 2 but you can make it one to simplify your data flow)
  2. added Select activity to select relevant arrays from the data.
  3. flattened the array(in order to add index column)
  4. added index by using rank activity (please read more about rank and dense rank and what is the difference between the two)
  5. added a Join activity , inner join by index column.
  6. Select activity to remove index column from the result.
  7. saved output to sink.

Json Data that i worked with: enter image description here

Data Flow: enter image description here

SelectRows Activity: enter image description here

Flatten Activity: enter image description here

Rank actitity: enter image description here

Join activity: enter image description here

please check these links:

https://learn.microsoft.com/en-us/azure/data-factory/data-flow-expressions-usage#mapAssociation

https://learn.microsoft.com/en-us/azure/data-factory/data-flow-map-functions