I am trying to use the CausalModel and Econml libraries in order to determine the effect of a variable on different scenarios displayed in the dataset below :
So firstly, I import the following libraries :
import pandas as pd
import econml
import dowhy
from dowhy import CausalModel
I then use pandas read_csv to import the dataset and call it "df."
After that I define the Causal Model as the following:
model = CausalModel(data=df.fillna(0),
treatment='ai_host.disk.write.bytes',
outcome='scenario',
common_causes='col'
)
model.view_model()
With the following being the output
After that I generate the estimand:
identified_estimand= model.identify_effect(proceed_when_unidentifiable=True)
print(identified_estimand)
With the following output:
Estimand type: nonparametric-ate
### Estimand : 1
Estimand name: backdoor
Estimand expression:
d
───────────────────────────(Expectation(scenario|col))
d[ai_host.disk.write.bytes]
Estimand assumption 1, Unconfoundedness: If U→{ai_host.disk.write.bytes} and U→scenario then P(scenario|ai_host.disk.write.bytes,col,U) = P(scenario|ai_host.disk.write.bytes,col)
### Estimand : 2
Estimand name: iv
No such variable found!
### Estimand : 3
Estimand name: frontdoor
No such variable found!
After this I finally try calculate the Causal Effect :
identified_estimand_experiment = model.identify_effect(proceed_when_unidentifiable=True)
from sklearn.ensemble import RandomForestRegressor
metalearner_estimate = model.estimate_effect(identified_estimand_experiment,
method_name="backdoor.econml.metalearners.TLearner",
confidence_intervals=False,
method_params={
"init_params":{'models': RandomForestRegressor()},
"fit_params":{}
})
print(metalearner_estimate)
But I get the following error each time :
ValueError Traceback (most recent call last)
<ipython-input-15-6f34377dbe77> in <module>()
8 method_params={
9 "init_params":{'models': RandomForestRegressor()},
---> 10 "fit_params":{}
11 })
12 print(metalearner_estimate)
7 frames
/usr/local/lib/python3.7/dist-packages/sklearn/preprocessing/_encoders.py in _transform(self, X, handle_unknown, force_all_finite, warn_on_unknown)
140 " during transform".format(diff, i)
141 )
--> 142 raise ValueError(msg)
143 else:
144 if warn_on_unknown:
ValueError: Found unknown categories [0] in column 0 during transform
Please may someone assist me and understanding and rectifying this error. Please also note that in order to use Econml, you need Python 3.8 and lower.
I was also encountering this problem but when I used a linear regression model instead of the Random Forest Regressor metalearner I had no issues.
This requires replacing
with
Other methods like using
and
might also be of interest.