How to accumulate data-sets?

1.2k views Asked by At

I have vector with values between 1 and N > 1. Some values COULD occur multiple times consecutively. Now I want to have a second row which counts the consecutively entries and remove all those consecutively occuring entries, e.g.:

A = [1 2 1 1 3 2 4 4 1 1 1 2]'

would lead to:

B = [1 1;
     2 1;
     1 2;
     3 1;
     2 1;
     4 2;
     1 3;
     2 1]

(you see, the second column contains the number of consecutively entries! I came across accumarray() in MATLAB recently but I can't find any solution with it for this task since it always regards the whole vector and not only consecutively entries.

Any idea?

2

There are 2 answers

2
Bill Cheatham On BEST ANSWER

This probably isn't the most readable or elegant way of doing it, but if you have large vectors and speed is an issue, this vectorisation may help...

A = [1 2 1 1 3 2 4 4 1 1 1 2];

First I'm going to pad A with a leading and trailing zero to capture the first and final transitions

>>  A = [0, A, 0];

The transition locations can be found where the difference between neighbouring values is not equal to zero:

>> locations = find(diff(A)~=0);

But because we padded the start of A with a zero, the first transition is nonsensical, so we only take the locations from 2:end. The values in A of these are the value of each segment:

>> first_column = A(locations(2:end))

ans =

     1     2     1     3     2     4     1     2

That's the first colomn - now to find the count of each number. This can be found from the difference in locations. This is where padding A at both ends becomes important:

>> second_column = diff(locations)

ans =

 1     1     2     1     1     2     3     1

Finally combining:

B = [first_column', second_column']

B =

 1     1
 2     1
 1     2
 3     1
 2     1
 4     2
 1     3
 2     1

This can all be combined into one less-readable line:

>> A = [1 2 1 1 3 2 4 4 1 1 1 2]';
>> B = [A(find(diff([A; 0]) ~= 0)), diff(find(diff([0; A; 0])))]

B =

 1     1
 2     1
 1     2
 3     1
 2     1
 4     2
 1     3
 2     1
0
Lucas On

I don't see another way then looping through the data set, but it is rather straight forward. Maybe this is not the most elegant solution, but as far as I can see, it works fine.

function B = accum_data_set(A)
    prev = A(1);
    count = 1;
    B = [];
    for i=2:length(A)
        if (prev == A(i))
            count = count + 1;
        else
            B = [B;prev count];
            count = 1;
        end
        prev = A(i);
    end
    B = [B;prev count];

output:

>> A = [1 2 1 1 3 2 4 4 1 1 1 2]';
>> B = accum_data_set(A)

B =

     1     1
     2     1
     1     2
     3     1
     2     1
     4     2
     1     3
     2     1