Computing similarity matrix with mixed data

Question

Computing similarity matrix with mixed data

362 views Asked by Martin Nemeth At 07 January 2017 at 18:55

I have asked this question also on "Cross Validated" forum, but with no answer so far, so I am trying also here:

I would like to compute similarity matrix (which I will further use for clustering purposes) from my data (failure data from automotive company). The data consist of these variables:

START DATE + TIME (dd/mm/yyyy hh/mm/ss), DURATION (in seconds), DAY OF THE WEEK (mon,tue,...), WORKING TEAM (1,2,3), LOCALIZATION (1,2,3,...,20), FAILURE TYPE

From this, it is clear, that there are continuous and categorical data. What method would you suggest to calculate similarities between failure types? I think I can not use Euclidean distance, or Gowe's similarity. Thank you in advance.

Original Q&A

There are 1 answers

**Malcolm McLean** · Answer 1 · 2017-01-07T19:09:45+00:00

No, you need an ad hoc function that represents your knowledge about what the data means in the real world. Presumably it will be mainly applying a weight to a continuous difference, and a 2D simple matrix for the discrete categorical variables. But don't rule our censoring of extreme values or fuzzification.

TechQA.

Computing similarity matrix with mixed data

There are 1 answers

Related Questions in CLUSTER-ANALYSIS

Related Questions in DATA-MINING

Related Questions in SIMILARITY

Related Questions in CATEGORICAL-DATA

Popular Questions

Trending Questions