I would like to combine different risk ratios into one forest plot. I would expect the output to be similar to metamiss
in STATA or metafor
in R. How can I do this in Python?
How can I create a forest plot?
9k views Asked by Safoora mk At
3
There are 3 answers
0
On
The statsmodels library has an API for doing simple meta-analysis and plotting forest plots. It supports DerSimonian-Laird (chi2) and Paule-Mandel (iterated). See the statsmodel docs for more use cases, options and examples.
An example from their docs:
import numpy as np
from statsmodels.stats.meta_analysis import combine_effects
# dummy data
mean_effect = np.array([61.00,61.40,62.21,62.30,62.34,62.60,62.70,62.84,65.90])
var_effect = np.array([0.2025,1.2100,0.0900,0.2025,0.3844,0.5625,0.0676,0.0225,1.8225])
idx = ['lab1','lab2','lab3','lab4','lab5','lab6','lab7','lab8','lab9']
# meta-analysis and forest plot
results = combine_effects(mean_effect, var_effect, method_re="chi2", use_t=True, row_names=idx)
print(results.summary_frame())
fig = results.plot_forest()
Output:
eff sd_eff ci_low ci_upp w_fe w_re
lab1 61.000000 0.450000 60.118016 61.881984 0.057436 0.123113
lab2 61.400000 1.100000 59.244040 63.555960 0.009612 0.040314
lab3 62.210000 0.300000 61.622011 62.797989 0.129230 0.159749
lab4 62.300000 0.450000 61.418016 63.181984 0.057436 0.123113
lab5 62.340000 0.620000 61.124822 63.555178 0.030257 0.089810
lab6 62.600000 0.750000 61.130027 64.069973 0.020677 0.071005
lab7 62.700000 0.260000 62.190409 63.209591 0.172052 0.169810
lab8 62.840000 0.150000 62.546005 63.133995 0.516920 0.194471
lab9 65.900000 1.350000 63.254049 68.545951 0.006382 0.028615
fixed effect 62.583397 0.107846 62.334704 62.832090 1.000000 NaN
random effect 62.390139 0.245750 61.823439 62.956838 NaN 1.000000
fixed effect wls 62.583397 0.189889 62.145512 63.021282 1.000000 NaN
random effect wls 62.390139 0.294776 61.710384 63.069893 NaN 1.000000
I’d also recommend having a read through the docs for the individual methods such as combine_effects()
which contains additional notes and references regarding the implementation.
0
On
Since I haven't found a customizable package to create a forest plot, I developed myforestplot for that purpose.
The following is one example of a forest plot using titanic dataset.
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import numpy as np
import matplotlib.pyplot as plt
import myforestplot as mfp
data = (pd.read_csv("titanic.csv")
[["survived", "pclass", "sex", "age", "embark_town"]]
.dropna()
)
ser = data["age"]
data["age"] = (ser
.mask(ser >= 40, "40 or more")
.mask(ser < 40, "20_39")
.mask(ser <20, "0_19")
)
res = smf.logit("survived ~ sex + age + embark_town", data=data).fit()
order = ["age", "sex", "embark_town"]
cont_cols = []
item_order = {"embark_town": ['Southampton', 'Cherbourg', 'Queenstown'],
"age": ["0_19", "20_39", "40 or more"]
}
df = mfp.statsmodels_pretty_result_dataframe(data, res,
order=order,
cont_cols=cont_cols,
item_order=item_order,
fml=".3f",
)
df
is a dataframe for creating a forest plot.
plt.rcParams["font.size"] = 8
fp = mfp.SimpleForestPlot(ratio=(8,3), dpi=150, figsize=(5,3), df=df,
vertical_align=True)
fp.errorbar(errorbar_kwds=None, log_scale=True)
xticklabels = [0.1, 0.5, 1.0, 2.0]
fp.ax2.set_xlim(np.log([0.1, 1.5]))
fp.ax2.set_xticks(np.log(xticklabels))
fp.ax2.set_xticklabels(xticklabels)
fp.ax2.set_xlabel("OR (log scale)")
fp.ax2.axvline(x=0, ymin=0, ymax=1.0, color="black", alpha=0.5)
fp.ax1.set_xlim([0.35, 1])
fp.embed_cate_strings("category", 0.3, header="Category",
text_kwds=dict(fontweight="bold"),
header_kwds=dict(fontweight="bold")
)
fp.embed_strings("item", 0.36, header="", replace={"age":""})
fp.embed_strings("nobs", 0.60, header="N")
fp.embed_strings("risk_pretty", 0.72, header="OR (95% CI)")
fp.horizontal_variable_separators()
fp.draw_outer_marker(log_scale=True, scale=0.008)
plt.show()
and we obtain the figure. A forest plot image
By using the zEPID package I create a forest plot of different risk ratios.