I have used statsmodels.formula.api.quantreg to predict on the test set. While running this method I got an unexpected error:
AttributeError Traceback (most recent call last)
<ipython-input-34-12e0d345b0fc> in <module>
----> 1 test['ypredL'] = model1.predict( test ).values
2 test['FVC'] = model2.predict( test ).values
3 test['ypredH'] = model3.predict( test ).values
4 test['Confidence'] = np.abs(test['ypredH'] - test['ypredL']) / 2
~\anaconda3\envs\knk\lib\site-packages\statsmodels\base\model.py in predict(self, exog, transform, *args, **kwargs)
1081 '\n\nThe original error message returned by patsy is:\n'
1082 '{0}'.format(str(str(exc))))
-> 1083 raise exc.__class__(msg)
1084 if orig_exog_len > len(exog) and not is_dict:
1085 import warnings
AttributeError: predict requires that you use a DataFrame when predicting from a model
that was created using the formula api.
The original error message returned by patsy is:
'DataFrame' object has no attribute 'dtype'
The intriguing part is that the same predict was run on the training set and it worked perfectly fine! Here is the code for the training part:
model1 = quantreg('FVC ~ Weeks+Percent+Age+Sex+SmokingStatus',
train).fit(q = 0.25)
model2 = quantreg('FVC ~ Weeks+Percent+Age+Sex+SmokingStatus',
train).fit(q = 0.5)
model3 = quantreg('FVC ~ Weeks+Percent+Age+Sex+SmokingStatus',
train).fit(q = 0.75)
train['y_predL'] = model1.predict(train).values
train['y_pred'] = model2.predict(train).values
train['y_predH'] = model3.predict(train).values
The error 'DataFrame' object has no attribute 'dtype' is right, but it is difficult to understand. So, what it really means is that there must be a conflict in dtypes in between the training and the test set. In the question, there was a dtype mismatch between the Weeks in the training set and the test set.
dtype of Train-Weeks is int, and dtype of Test-Weeks is str.