I have a genetic dataset where the index of a row is the name of the gene. I am looking to also find the row number of any given gene so I can look at genes individually after they've gone through a machine learning model prediction - to interpret the gene's prediction in shap. How I code for the shap plot currently needs a row number to pull out the specific gene.
My data looks like this:
Index   Feature1  Feature2   ... FeatureN
Gene1     1           0.2          10
Gene2     1           0.1          7
Gene3     0           0.3          10
For example if I want to pull out and view model prediction of Gene3 I do this:
import shap
shap.initjs()
xgbr = xgboost.XGBRegressor()
def shap_plot(j):
    explainerModel = shap.TreeExplainer(xgbr)
    shap_values_Model = explainerModel.shap_values(X_train)
    p = shap.force_plot(explainerModel.expected_value, shap_values_Model[j], X_train.iloc[[j]],feature_names=df.columns)
    return(p)
shap_plot(3)
Doing shap_plot(3) is a problem for me as I do not actually know if the gene I want is in row 3 in the shuffled training or testing data.
Is there a way to pull out the row number from a known Gene index? Or potentially re-code my shap plot so it does accept my string indices? I have a biology background so any guidance would be appreciated.
 
                        
Try the following. df is your dataframe and result will give you the row number (first row will result 1, etc) for a given gene