Actual Header from CSV file

Supercategory   Segment Subsegment  Class   Total Property (Brand + License)    Corporate Manufacturer  Manufacturer    Brand   License Licensors   Model Number    Item Description    Jan 2022    Feb 2022    Jan 2022    Feb 2022    Jan 2022    Feb 2022    Jan 2022    Feb 2022    Jan 2022    Feb 2022    Jan 2022    Feb 2022

Output Expected:

COUNTRY LOCAL_CURRENCY  YEAR    MONTH   MONTH_YEAR  Supercategory   Segment Subsegment  Class   Total Property (Brand + License)    Corporate Manufacturer  Manufacturer    Brand   License Licensors   Model Number    Item Description    UNITS   AMOUNT_LC   AMOUNT_USD  % Distribution - Weighted   % of Stores Selling - Unweighted    $ Velocity - Weighted   Unit Velocity - Weighted    EXCHANGE_RATE

Below is the picture with Data in Actual file format and expected output (normalized)

Sample shown image attached

1

There are 1 answers

1
Saideep Arikontham On

The expected output you want can be achieved with the help of dataflows in azure data factory (ADF studio -> Author -> Dataflow -> New dataflow). The following is the csv header of the sample data that I am using (similar to your csv) as the source.

Category    Segment    total_property(brand+license)    Jan 2022    Feb 2022    Jan 2022    Feb 2022    Jan 2022    Feb 2022 

enter image description here

After configuring the source, create unpivot transformation to convert this column data of Jan 2022 and Feb 2022 to row data. Under this transformation, there are 3 important fields to be filled- Ungroup By, Unpivot Key and Unpivoted Columns. The following is how you fill these fields to achieve your requirement.

  • Ungroup By: Select all the columns other than columns Jan 2022 and Feb 2022

enter image description here

  • Unpivot Key: Name this column as Month_Year, check Enter values and enter the values as following.

enter image description here

  • Unpivoted Columns: Select Lateral column arrangement and add the columns that will be left after unpivot happens.

enter image description here

You can preview the data and the result would be like the following.

enter image description here

Since you want to add Month and Year column, create a derived column transformation. Add two columns naming Month with value as month(toDate(month_year,'MMM yyyy')) and Year with value as year(toDate(month_year,'MMM yyyy')). The final data preview would be the expected output you want to get.

enter image description here

Output:

enter image description here

Note: Since the csv has columns with same column names (Jan 2022 and Feb 2022), dataflow is taking the values as Jan 20223, Feb 20224 etc. But this does not affect the process.