I am new to machine learning, working on a regression problem. The overview of the problem is like the below;
- It has 6 variables in total. 5 of them features, 4 features are categorical.
- Using Label encoding and tried other encoding techniques also.
- Correlation factor among each of them was weak as all them are completely independent.
I have tried polynomial regression(tried up to 3rd degree), Lasso & Ridge regression. RMSE is 1.48 to 1.50 for all of them almost same.
Sample code snippet
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
poly = PolynomialFeatures(degree = 3)
X_poly = poly.fit_transform(X_train)
poly.fit(X_poly, y_train)
lin2 = LinearRegression()
lin2.fit(X_poly, y_train)
Can any one from community help me to increase the model performance. should i use neural network or tune the hyperperameters for the used algorithms.
Any guidance would be greatly appreciated. Thank you.