Python Regression of Categorical data with interactions

764 views Asked by At

I have found how people do linear regressions on Python using sklearn and doing reg.fit() with their data, but this only lets you do it if you're looking for a regression like y = Ax1 + Bx2 +Cx3 etc

But what if I had categorical data that had some sort of interactions such that I wanted the variables multiplied instead of added? something like y = (Ax1)*(Bx2)*(Cx3)

1

There are 1 answers

0
Heapify On

To take care of the interactions between input features such as x1, x2 and x3, the common practice is to create polynomial features such as x1^3, x1^2*x2 + x1*x2*x3 + ... + x3^3. For example, in your case, your equation for y would look like the following:

y = A*x1^3 + B*x2^3 + C*x3^3 + D*x1^2*x2 + E*x1*x2*x3 + F*x1*x2^2 + ...

I hope you get the idea. To take care of the categorical data, there are techniques like One-Hot Encoding that gives a pretty simple vector representation of your data. Scikit Learn has implementation for One-hot encoding

If you want to take your learning to the next level, you can also look into training algorithms like Support Vector Machines and Neural Network