In a group of correlated variables, how can I deduce which subset of variables best describe the remaining variables?

Question

In a group of correlated variables, how can I deduce which subset of variables best describe the remaining variables?

116 views Asked by Alexander Fokas At 03 January 2017 at 19:11

I have a data set of 60 sensors making 1684 measurements. I wish to decrease the number of sensors used during experiment, and use the remaining sensor data to predict (using machine learning) the removed sensors.

I have had a look at the data (see image) and uncovered several strong correlations between the sensors, which should make it possible to remove X sensors and use the remaining sensors to predict their behaviour.

How can I “score” which set of sensors (X) best predict the remaining set (60-X)?

Original Q&A

There are 1 answers

**Prune** · Answer 1 · 2017-01-03T21:18:44+00:00

Are you familiar with Principal Component Analysis (PCA)? It's a child of Analysis of Variance (ANOVA). Dimensionality Reduction is another term to describe this process.

These are usually aimed at a set of inputs that predict a single output, rather than a set of peer measurements. To adapt your case to these methods, I would think that you'd want to begin by considering each of the 60 sensors, in turn, as the "ground truth", to see which ones can be most reliably driven by the remainder. Remove those and repeat the process until you reach your desired threshold of correlation.

I also suggest a genetic method to do this winnowing; perhaps random forests would be of help in this phase.

TechQA.

In a group of correlated variables, how can I deduce which subset of variables best describe the remaining variables?

There are 1 answers

Related Questions in MACHINE-LEARNING

Related Questions in STATISTICS

Related Questions in CORRELATION

Related Questions in PREDICTION

Related Questions in INFORMATION-THEORY

Popular Questions

Popular Tags

Trending Questions