I am working with a PySpark DataFrame that has a JSON column from CSV file budgetthresholdaction. The JSON structure looks like this:
{
"budgetThresholdAction0threshold0": {
"thresholdValue": 80,
"action": "promotionsetup.BudgetThresholdPercentageParameter"
},
"budgetThresholdAction1action1": {
"thresholdName": "budgetThresholdAction1threshold0",
"action": "promotionsetup.BudgetTerminateActionParameter"
},
"budgetThresholdAction1action2": {
"thresholdName": "budgetThresholdAction1threshold0",
"action": "promotionsetup.BudgetTurnOffMessagingActionParameter"
},
"budgetThresholdAction1threshold0": {
"name": "budgetThresholdAction1threshold0",
"thresholdValue": 95,
"action": "promotionsetup.BudgetThresholdPercentageParameter"
},
"budgetThresholdAction1action0": {
"thresholdName": "budgetThresholdAction1threshold0",
"action": "promotionsetup.BudgetNotifyActionParameter"
},
"budgetThresholdAction0action0": {
"thresholdName": "budgetThresholdAction0threshold0",
"action": "promotionsetup.BudgetNotifyActionParameter"
}
}
I need to create two separate columns, threshold and action, in my PySpark DataFrame with the following values:
threshold |
action |
|---|---|
| 80 | promotionsetup.BudgetThresholdPercentageParameter |
| 95 | promotionsetup.BudgetNotifyActionParameter |
| 95 | promotionsetup.BudgetTurnOffMessagingActionParameter |
| 95 | promotionsetup.BudgetTerminateActionParameter |
I've attempted to use get_json_object and when functions, but I'm struggling to get the desired output. Could you please guide me on how to achieve this transformation in PySpark?
What I tried :
# To get threshold values as array and then explode it using :
df.select(array(get_json_object(col('budgetthresholdaction'),'$.budgetThresholdAction0threshold0.thresholdValue'),get_json_object(col('budgetthresholdaction'),'$.budgetThresholdAction1threshold0.thresholdValue')).alias('T')).show(1,False)
But I'm struggling to map Threshold and Action Values as for single threshold there can be multiple action.
We can have more than 2 Threshold and for every threshold it can have 1 or more action.
Mapping is :
ActionNThreshold0 => ActionNActionX
For example :
Action0Threshold0 = [ Action0action0, Action0action1,Action0action2...]
Action1Threshold0 = [ Action1action0,Action1action1...]