Given a vector V in Matlab I want to calculate the code length without generating the code...
v = [0.1,0.1,0.1,0.2,0.2,0.3]; the code length = 17. How can I calculate it without generating the code.
Thanks
Given a vector V in Matlab I want to calculate the code length without generating the code...
v = [0.1,0.1,0.1,0.2,0.2,0.3]; the code length = 17. How can I calculate it without generating the code.
Thanks
Maybe I wasn't clear with my question, but I think this code will return the minimal code length for a vector of data 'v'.
% return the huffman lenght of a matrix
function S = hufman_length(v)
v = (v(:));
v = hist(v,256);
v = v(find(v>0));
S = 0;
%acumulating the probability
while (length(v) >= 2)
v = sort(v);
S = S + v(1) + v(2);
v(2) = v(1) + v(2);
v = v(2:length(v));
end
end
So there are six symbols? Then the maximum code length can't be 17. The maximum code length with six symbols for any set of frequencies is five bits. (0, 10, 110, 1110, 11110, 11111).
For that particular set of probabilities, assuming one symbol per probability and that the probabilities are exact, you can get two different codes depending on the choices made when executing the Huffman algorithm. One has a maximum length of 3, the other a maximum length of 4. Both codes are equally optimal in coding the symbols. The two codes have code lengths, in the same frequency order, (4,4,3,2,2,2) and (3,3,3,3,2,2).
You may mean the sum of the bits over the six possible symbols, which is in fact 17 for one of the codes, but 16 for the other. However that is a meaningless measure, since you have used each symbol once, in contradiction to their stated probabilities. A useful measure would be multiplying each symbol length in bits by the probability to get an average symbol length in bits. That is 2.5 bits for both of those codes. That is how you verify that both codes are equally optimal.
In general you need to apply the Huffman algorithm in order to determine the maximum code length. There is no other shortcut. You can traverse the tree to find the maximum length. You do not need to explicitly generate the code per se, but the code is implied by the tree.
You can compute the entropy to get a lower bound on the average symbol length in bits. That is the sum of each probability times its negative base-2 logarithm. In this case, the entropy is 2.446.