I would like to combine different risk ratios into one forest plot. I would expect the output to be similar to metamiss in STATA or metafor in R. How can I do this in Python?
How can I create a forest plot?
9k views Asked by Safoora mk At
        	3
        	
        There are 3 answers
0
                 On
                        
                            
                        
                        
                            On
                            
                            
                                                    
                    
                The statsmodels library has an API for doing simple meta-analysis and plotting forest plots. It supports DerSimonian-Laird (chi2) and Paule-Mandel (iterated). See the statsmodel docs for more use cases, options and examples.
An example from their docs:
import numpy as np
from statsmodels.stats.meta_analysis import combine_effects
# dummy data
mean_effect = np.array([61.00,61.40,62.21,62.30,62.34,62.60,62.70,62.84,65.90])
var_effect = np.array([0.2025,1.2100,0.0900,0.2025,0.3844,0.5625,0.0676,0.0225,1.8225])
idx = ['lab1','lab2','lab3','lab4','lab5','lab6','lab7','lab8','lab9']
# meta-analysis and forest plot
results = combine_effects(mean_effect, var_effect, method_re="chi2", use_t=True, row_names=idx)
print(results.summary_frame())
fig = results.plot_forest()
Output:
                         eff    sd_eff     ci_low     ci_upp      w_fe      w_re
lab1               61.000000  0.450000  60.118016  61.881984  0.057436  0.123113
lab2               61.400000  1.100000  59.244040  63.555960  0.009612  0.040314
lab3               62.210000  0.300000  61.622011  62.797989  0.129230  0.159749
lab4               62.300000  0.450000  61.418016  63.181984  0.057436  0.123113
lab5               62.340000  0.620000  61.124822  63.555178  0.030257  0.089810
lab6               62.600000  0.750000  61.130027  64.069973  0.020677  0.071005
lab7               62.700000  0.260000  62.190409  63.209591  0.172052  0.169810
lab8               62.840000  0.150000  62.546005  63.133995  0.516920  0.194471
lab9               65.900000  1.350000  63.254049  68.545951  0.006382  0.028615
fixed effect       62.583397  0.107846  62.334704  62.832090  1.000000       NaN
random effect      62.390139  0.245750  61.823439  62.956838       NaN  1.000000
fixed effect wls   62.583397  0.189889  62.145512  63.021282  1.000000       NaN
random effect wls  62.390139  0.294776  61.710384  63.069893       NaN  1.000000
I’d also recommend having a read through the docs for the individual methods such as combine_effects() which contains additional notes and references regarding the implementation.
0
                 On
                        
                            
                        
                        
                            On
                            
                            
                                                    
                    
                Since I haven't found a customizable package to create a forest plot, I developed myforestplot for that purpose.
The following is one example of a forest plot using titanic dataset.
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import numpy as np
import matplotlib.pyplot as plt
import myforestplot as mfp
data = (pd.read_csv("titanic.csv")
      [["survived", "pclass", "sex", "age", "embark_town"]]
      .dropna()
      )
ser = data["age"]
data["age"] = (ser
             .mask(ser >= 40, "40 or more")
             .mask(ser < 40, "20_39")
             .mask(ser <20, "0_19")
             )
res = smf.logit("survived ~ sex + age + embark_town", data=data).fit()
order = ["age", "sex", "embark_town"]
cont_cols = []
item_order = {"embark_town": ['Southampton', 'Cherbourg', 'Queenstown'],
              "age": ["0_19", "20_39", "40 or more"]
              }
df = mfp.statsmodels_pretty_result_dataframe(data, res,
                                             order=order,
                                             cont_cols=cont_cols,
                                             item_order=item_order,
                                             fml=".3f",
                                             )
df is a dataframe for creating a forest plot.
plt.rcParams["font.size"] = 8
fp = mfp.SimpleForestPlot(ratio=(8,3), dpi=150, figsize=(5,3), df=df,
                          vertical_align=True)
fp.errorbar(errorbar_kwds=None, log_scale=True)
xticklabels = [0.1, 0.5, 1.0, 2.0]
fp.ax2.set_xlim(np.log([0.1, 1.5]))
fp.ax2.set_xticks(np.log(xticklabels))
fp.ax2.set_xticklabels(xticklabels)
fp.ax2.set_xlabel("OR (log scale)")
fp.ax2.axvline(x=0, ymin=0, ymax=1.0, color="black", alpha=0.5)
fp.ax1.set_xlim([0.35, 1])
fp.embed_cate_strings("category", 0.3, header="Category",
                      text_kwds=dict(fontweight="bold"),
                      header_kwds=dict(fontweight="bold")
                      )
fp.embed_strings("item", 0.36, header="", replace={"age":""})
fp.embed_strings("nobs", 0.60, header="N")
fp.embed_strings("risk_pretty", 0.72, header="OR (95% CI)")
fp.horizontal_variable_separators()
fp.draw_outer_marker(log_scale=True, scale=0.008)
plt.show()
and we obtain the figure. A forest plot image

By using the zEPID package I create a forest plot of different risk ratios.