Dimension reduction for logical arrays

119 views Asked by At

I have measurements of 5 devices at two different points of time. A measurement basically consists of an array of ones and zeros corresponding to a bit value at the corresponding location:

whos measurement1_dev1_time1

Name                         Size               Bytes  Class      Attributes

measurement1_dev1_time1      4096x8             32768  logical

I assume that for a specific device the changes between time 1 and 2 of the measurements are unique. However, since I am dealing with 32768 bits at different locations, it is quite hard to visualize if there is some kind of dependency.

As every bit at location xcan be regarded as one dimension of an observation I thought to use PCA to reduce the number of dimensions.

Thus, for every of the 5 devices:

  1. I randomly sample n measurements at point t1and t2 seperatly
  2. I prepare an array as input for pca() with m*n columns (m< 32768; its a subset of all the observed bits, as the original data might be too big for pca) and 4 rows (one row for each device).
  3. On this array A I calculate the pca: ``[coeff score latent] = pca(zscore(A))```
  4. Then I try to visualize it using biplot: biplot(coeff(:,1:2), 'score', score(:,1:2))

However, this gives me really strange results. Maybe PCA is not the right approach for this problem? I also modified the input data to do the PCA not on the logical bit array itself. Instead, I created a vector, which holds the indices where there is a '1' in the original measurement array. Also this produces strange results.

As I am completely new to PCA I want to ask you if you either see a flaw in the process or if PCA is just not the right approach for my goal and I better look for other dimension reduction approaches or clustering algorithms.

1

There are 1 answers

4
Dima Lituiev On

Can this 'some kind of dependency' be just pairwise correlation of your data points? Or what do you want to find out?

Do you get 'expected results' if you do:

meas_norm = 2*measurement1_dev1_time1 - 1;

CovarianceMatrix = meas_norm' * meas_norm;

figure
pcolor(CovarianceMatrix )

Can there be a problem of data type? Try feeding double(data). (Please add proper code into your example)

if you look for dimension reduction, you can also think about ICA.


UPD: can you probe it with xor? As you cannot do xor on rows or columns, you can trick all(x, dimension)

example = imread('cameraman.tif')>128;

meas_points = numel(example);
num_sensors = 4;

%// simulate data for t1
meas_before = repmat(example(:), 1, num_sensors);
flickering_before = (rand(meas_points, num_sensors)<0.001);
meas_before(flickering_before) = ~meas_before(flickering_before);

%// simulate position of changing pixels, let's say 8%
true_change = (rand(num_sensors,1)<0.08);

%// simulate data for t2    
meas_after = repmat(example(:), 1, num_sensors);
meas_after(true_change) = ~meas_after(true_chage);
flickering_after = (rand(meas_points, num_sensors)<0.001);
meas_after(flickering_after) = ~meas_after(flickering_after);

stable_points_after = all(meas_after, 2) | all(~meas_after, 2);
stable_point_fraction = sum(stable_points_after)./ meas_points;

%// similarly for the states before (i.e. t1)
stable_points_before = all(meas_before, 2) | all(~meas_before, 2);   

%// now see which change coherently
stable_chage = meas_after(stable_points_after, 1) & meas_before(stable_points_before, 1)