So I recently built a Huffman tree and it worked well for encoding text. Now I wanted to try and see if I could compress images aswell. Obviously, images that differ little (like abstract images) should be better to compress. My question is if I'm thinking about this correct. Let's say I got a jpg image that is 786x463 pixels big. Each pixel have 3 color values, which each can be 256 different colors, therefore 8 bits. We got three colors, RGB, so each pixel is effectivly 24 bits all and all. So (786x463) * 24 = 853 40 16 bits which is about 1 Mb.
Now I can loop over all the pixels, and put the three color values in a list, and then count the occurences of each color from that list and remove the duplicates and put them in my tree.
private void saveColors() {
BufferedImage img = readImage(url);
int width = img.getWidth();
int height = img.getHeight();
int size = (width * height) * 3;
pixels = new ArrayList<Integer>(size);
WritableRaster inraster = img.getRaster();
for (int i = 0; i < width; i++)
for (int j = 0; j < height; j++) {
pixels.add(inraster.getSample(i, j, 0));
pixels.add(inraster.getSample(i, j, 1));
pixels.add(inraster.getSample(i, j, 2));
}
countColors();
}
/**
* Counts occurences and removes duplicates
*/
private void countColors() {
List<Integer> duplicateRemoved;
duplicateRemoved = new ArrayList<Integer>(pixels);
Set<Integer> hs = new HashSet<Integer>();
hs.addAll(duplicateRemoved);
duplicateRemoved.clear();
duplicateRemoved.addAll(hs);
pixelFreq = new int[duplicateRemoved.size()];
pixelArr = new int[duplicateRemoved.size()];
for (int i = 0; i < pixelArr.length; i++) {
int p = duplicateRemoved.get(i);
int count = Collections.frequency(pixels, p);
pixelArr[i] = p;
pixelFreq[i] = count;
}
//Build tree...
}
So I tried to encode this picture : [Just black and white1
And the encoding results gave me this:
color Freq Code bits
'0' -- 4410 -- '0010'--17640
'1' -- 1185 -- '001001'--7110
'2' -- 858 -- '00100'--4290
'3' -- 591 -- '00100101'--4728
'4' -- 315 -- '0010010101'--3150
'5' -- 180 -- '0010010101010'--2340
'6' -- 75 -- '0010001010010'--975
'7' -- 30 -- '00100010100100'--420
'8' -- 9 -- '0010001010010000'--144
'246' -- 3 -- '001000101001000'--45
'247' -- 15 -- '001000101001000'--225
'248' -- 84 -- '0010001010'--840
'249' -- 138 -- '001001010101'--1656
'250' -- 273 -- '0010001010'--2730
'251' -- 480 -- '0010001'--3360
'252' -- 1227 -- '0001'--4908
'253' -- 1749 -- '00010'--8745
'254' -- 2475 -- '00'--4950
'255' -- 1077657 -- '0'--1077657
Original compressed reduction percent
8734032--1145913---7588119--86.879906%
Which seems resonably, white occurs the most of times and get's the smallest code, where the black get's a slightly larger code. Now as you can see, the total number of bits ended up being 1145913. So we calculate the difference: 8534016 - 1145913 = 7388103 which is about 0,4 Mb. So the difference is about 0,6 Mb, 60% reduction.
So my question is, am I thinking about this right? I mean counting all the color values for each pixel?