ConvergenceError: Convergence halted due to matrix inversion problems

3.4k views Asked by At

I keep having error messages anytime I try running CoxPH regression in Python. I'm not a pro in python still learning.

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
from lifelines import KaplanMeierFitter
from lifelines.statistics import multivariate_logrank_test   
from lifelines.statistics import logrank_test
from lifelines import CoxPHFitter
import pyreadstat

After loading the data

data["faculty2"] = data["faculty2"].astype(int)
data["sex"] = data["sex"].astype(int)
data["mos"] = data["mos"].astype(int)
data["state2"] = data["state2"].astype(int)
data["ss"] = data["ss"].astype(int)
data["supervisor"] = data["supervisor"].astype(int)
data["time"] = data["time"].astype(int)
data["event"] = data["event"].astype(int)

Eventvar = data['event']
Timevar = data['time']

""" assigning labels to values"""
data['sex'] = data['sex'].apply({1:'Male', 0:'female'}.get)
data['faculty2'] = data['faculty2'].apply({1:'Arts',2:'Sciences',3:'Medicals',\
                                            4:'Agriculture', 5:'Social Sciences',6:'Education',\
                                                7:'Tech',8:'Law',9:'Institues'}.get)
data['state2'] = data['state2'].apply({1:'SW',2:'SS',3:'SE',4:'NC', 5:'NE',6:'NW'}.get)
data['ss'] = data['ss'].apply({1:'Yes', 0:'No'}.get)
data['mos'] = data['mos'].apply({1:'Full Time', 0:'Part Time'}.get)

cf = CoxPHFitter()
cf.fit(data, 'time', event_col='event',show_progress=True)
cf.print_summary()

I get this error message when i run these codes

ValueError: could not convert string to float: 'Arts'

Please I need help I don't know how to go about this If I add dummies i have a different error message

ohe_features = ['faculty2', 'sex', 'mos','state2','ss'] 
data = pd.get_dummies(data,drop_first=True,columns=ohe_features)

This is the error message I get

ConvergenceError: Convergence halted due to matrix inversion problems. Suspicion is high collinearity. Please see the following tips in the lifelines documentation: https://lifelines.readthedocs.io/en/latest/Examples.html#problems-with-convergence-in-the-cox-proportional-hazard-modelMatrix is singular

If i run the codes without assigning values to labels and without adding dummies it runs but the different levels are not showing. It runs as though it were continuous variables

Here is the data

2

There are 2 answers

0
Yesid Fernando Orjuela Orozco On

In the lifelines documentation they suggest

  1. Add the penalize parameter
  2. Use the variance inflation factor or
  3. check the correlation matrix in your dataset

https://lifelines.readthedocs.io/en/latest/Examples.html#problems-with-convergence-in-the-cox-proportional-hazard-modelMatrix

0
S Ghosh On

I had the pretty identical problem. I changed

cph = CoxPHFitter()

to

cph = CoxPHFitter(penalizer=0.0001)

This solved the issue.