Computing similarity matrix with mixed data

255 views Asked by At

I have asked this question also on "Cross Validated" forum, but with no answer so far, so I am trying also here:

I would like to compute similarity matrix (which I will further use for clustering purposes) from my data (failure data from automotive company). The data consist of these variables:

START DATE + TIME (dd/mm/yyyy hh/mm/ss), DURATION (in seconds), DAY OF THE WEEK (mon,tue,...), WORKING TEAM (1,2,3), LOCALIZATION (1,2,3,...,20), FAILURE TYPE

From this, it is clear, that there are continuous and categorical data. What method would you suggest to calculate similarities between failure types? I think I can not use Euclidean distance, or Gowe's similarity. Thank you in advance.

1

There are 1 answers

1
Malcolm McLean On

No, you need an ad hoc function that represents your knowledge about what the data means in the real world. Presumably it will be mainly applying a weight to a continuous difference, and a 2D simple matrix for the discrete categorical variables. But don't rule our censoring of extreme values or fuzzification.