I have a dataframe with numerical variables, such as age and length of hospital stay, and categorical variables, such as gender and outcome (Positive, Negative). The data for the 'outcome' variable is imbalanced, e.g., Positive = 91, Negative = 604. How can I deal with the imbalance issue without affecting the descriptive statistics of the numerical variables?
I've tried different techniques to address imbalanced data (Oversampling, SMOTE in Python), but these techniques alter the descriptive statistics of the numerical variables (mean, standard deviation, and quartiles).