I have data in which it has one column named category. Categories are chair and table.
- But Issue is that I want to to perform count for table,chair and two more categories bowl and plat which is not present in my data.
- I want to get count for those all 4 categories from data. If not found that category it should write zero for its count. So how to do that in adf?
I have tried to do filtering, derived column but derived column give me additional column but i want it at row level.
As the remaining values doesn't exists in the column, Dataflow can't give the count for those.
So, you can follow below approach to achieve your requirement. In this approach, you need an array containing values of all categories like
['chair','table','bowl','plat'].This is starting data that I am taking in the dataflow source:
First use an aggregate transformation to get the category and count columns.
Give
categorycolumn in the Group by section and create a new columncountwithcount(category)expression in the aggregate section like below.It will give the result with existing values.
Now, take another aggregate transformation after this. In this, leave group by section empty and in the aggregate section, create a column for an array of existing categories using
collect(category)expression.It will produce below results.
Next, take a derived column transformation and create two columns using the above
originalarray column and the array of all categories.This will first generate an array of non-existing categories and unfolds that array as rows. Those will be converted to string type column like below. Coming to
count2columns, whatever the values that are coming from this transformation, the count for those will be0only.Now, use a select transformation and remove the extra
originalcolumn.As a result, select transformation will only contains two columns.
Now, create a New branch after the aggregate1 transformation and add a Union transformation to it. Use the above select1 transformation as another input and do the union by position.
This will give the desired result.
Add your sink to the union transformation and you can run the dataflow.