I keep having error messages anytime I try running CoxPH regression in Python. I'm not a pro in python still learning.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
from lifelines import KaplanMeierFitter
from lifelines.statistics import multivariate_logrank_test
from lifelines.statistics import logrank_test
from lifelines import CoxPHFitter
import pyreadstat
After loading the data
data["faculty2"] = data["faculty2"].astype(int)
data["sex"] = data["sex"].astype(int)
data["mos"] = data["mos"].astype(int)
data["state2"] = data["state2"].astype(int)
data["ss"] = data["ss"].astype(int)
data["supervisor"] = data["supervisor"].astype(int)
data["time"] = data["time"].astype(int)
data["event"] = data["event"].astype(int)
Eventvar = data['event']
Timevar = data['time']
""" assigning labels to values"""
data['sex'] = data['sex'].apply({1:'Male', 0:'female'}.get)
data['faculty2'] = data['faculty2'].apply({1:'Arts',2:'Sciences',3:'Medicals',\
4:'Agriculture', 5:'Social Sciences',6:'Education',\
7:'Tech',8:'Law',9:'Institues'}.get)
data['state2'] = data['state2'].apply({1:'SW',2:'SS',3:'SE',4:'NC', 5:'NE',6:'NW'}.get)
data['ss'] = data['ss'].apply({1:'Yes', 0:'No'}.get)
data['mos'] = data['mos'].apply({1:'Full Time', 0:'Part Time'}.get)
cf = CoxPHFitter()
cf.fit(data, 'time', event_col='event',show_progress=True)
cf.print_summary()
I get this error message when i run these codes
ValueError: could not convert string to float: 'Arts'
Please I need help I don't know how to go about this If I add dummies i have a different error message
ohe_features = ['faculty2', 'sex', 'mos','state2','ss']
data = pd.get_dummies(data,drop_first=True,columns=ohe_features)
This is the error message I get
ConvergenceError: Convergence halted due to matrix inversion problems. Suspicion is high collinearity. Please see the following tips in the lifelines documentation: https://lifelines.readthedocs.io/en/latest/Examples.html#problems-with-convergence-in-the-cox-proportional-hazard-modelMatrix is singular
If i run the codes without assigning values to labels and without adding dummies it runs but the different levels are not showing. It runs as though it were continuous variables
In the lifelines documentation they suggest
https://lifelines.readthedocs.io/en/latest/Examples.html#problems-with-convergence-in-the-cox-proportional-hazard-modelMatrix