I am running a clustering model on a group of patients who are hypertensive with hopes of identifying different variations in clinical characteristics among hypertensive individuals.
One of the issues I currently have is that I initially filtered out all of the non-hypertensive patients and then preprocessed.
I planned to use a random-forest model with Hypertension being my response variable to select the top 10 features and then run unsupervised clustering. However, I now realize that this is not possible since my non-hypertensive patients are no longer in the dataset.
Are there any better way to go about selecting more important variables in my scenario?