How to generate data in python pandas for promotion planning use-case?

73 views Asked by At

Background : The business objective is to build a ML model which predicts lift in profit based on historical performance of the product during both promotional period as well as non-promotional periods.

I have built a ML model which predicts profit on daily level for each asin based on historical data using following inputs : Date, Product ID, Promotion flag , Promotion type, discount percent, current price.

Now I want to generate inference data by building a dataset which has all combinations of productID and promotion types along with several levels of discount percent for each day in the future. Is there a way to do this in pandas ?

Conditions : Given a date range (for ex. 3/17/23 - 3/27/23), promotion types - 3 types of promotions (bd,pd,ld), discount percent [0,5,10]

generate data for each day in the date range for each productID with different combinations of promotion type and discount percent.

Expected dataframe output :

This is just an example for two dates (3/17 and 3/18), 1 productID and all combinations for promotion type and discount percent. I want to create this for entire range of dates for each productID.

Expected output screenshot

1

There are 1 answers

0
thmslmr On

Looking at the example of the expected output you provided, here is a simple way to do it: list the different values you want to see in your dataframe, generate all combinations using itertools.product, and finally create the dataframe with the columns you want.

from itertools import product
import pandas as pd

# Configuration
products_ids = [1, 2, 3]
date_range = ["2023-03-17", "2023-03-27"]
discount_types = ["bd", "pd", "ld"]
discount_prct = [0, 5, 10]

# Get all dates in range
all_dates = pandas.date_range(date_range[0], date_range[1], freq='d')

# Get all combinations
combinations = product(products_ids, all_dates, discount_types, discount_prct)

# Create the dataframe
df = pd.DataFrame(combinations, columns=["ProductID", "Date", "DiscountType", "DiscountPercent"])

Here are the 15th first rows:

ProductID Date DiscountType DiscountPercent
0 1 2023-03-17 00:00:00 bd 0
1 1 2023-03-17 00:00:00 bd 5
2 1 2023-03-17 00:00:00 bd 10
3 1 2023-03-17 00:00:00 pd 0
4 1 2023-03-17 00:00:00 pd 5
5 1 2023-03-17 00:00:00 pd 10
6 1 2023-03-17 00:00:00 ld 0
7 1 2023-03-17 00:00:00 ld 5
8 1 2023-03-17 00:00:00 ld 10
9 1 2023-03-18 00:00:00 bd 0
10 1 2023-03-18 00:00:00 bd 5
11 1 2023-03-18 00:00:00 bd 10
12 1 2023-03-18 00:00:00 pd 0
13 1 2023-03-18 00:00:00 pd 5
14 1 2023-03-18 00:00:00 pd 10