I'm working on a Lasso Regression model with PolynomialFeatures = 2 and encountering scale issues in my retail dataset. The dataset consists of 700 stores, where 80% are small shops under 10,000 square feet and 10% are large supermarkets between 60,000-80,000 square feet. Given that y-values strongly correlate with store size, my MSE-optimized model is biased towards the larger stores. I'm aiming for a model that effectively estimates for both small and large stores. I'm currently using x_max for scaling because it's the only option I've managed to correctly rescale the interaction terms.
I'm considering some of the following approaches:
- Segmenting the data into quartiles or percentiles based on predicted y-values and then applying a metalearner.
- Using LOESS (Localized Regression) to better fit the data.
- Adopting a Lasso variant that uses MAPE as the cost function instead of MSE.
Which method would best tackle this bias in predicting for both small and large stores? Any input appreciated. Thanks!
(I've tried to segment the dataset into quartiles based on the size of the y_value and train 4 models, but I'm scared that the smaller models looses out on important information on the interaction between variables, since the whole dataset is only 700 observations.)