Compressing text with good spelling, and canonical huffman code

Question

Compressing text with good spelling, and canonical huffman code

491 views Asked by jokoon At 18 October 2012 at 23:29

I want to compress text by using words as symbols instead of characters, I don't really know if it's a good idea, but I just want to test it (for science).

The problem is, I can't really store all the words of the english language, so I have gathered a list of very common words (around 1600 words) which I plan to alter just like spellcheckers store derived forms of words. (example: kill, kill-ing, kill-er, kill-s depending if it's a verb, adjective etc)

http://en.wikipedia.org/wiki/Canonical_Huffman_code

I'd like to know if this special version of huffman coding suits my need, since the 'dictionary' won't change often and can be distributed with the decompressing tool. It also seems I would have to dictate frequencies of the words when creating my original huffman tree before turning it into a canonical huffman tree.

Can you correct me if I'm missing a point here, or if it is a good or bad idea ?

Original Q&A

There are 1 answers

**Ujjwal Singh** · Accepted Answer · 2012-10-19T08:07:23+00:00

The point to note here is that this special variant has the advantage of only smaller codebook and not the compressed data. Thus use it wherever you need to include the huffman codebook along with your data provided the pieces you replace are sequential. Since words naturally can be sequenced in an order - you can, and thus you should - use Canonical Huffman Code.

TechQA.

Compressing text with good spelling, and canonical huffman code

There are 1 answers

Related Questions in COMPRESSION

Related Questions in PREFIX

Related Questions in HUFFMAN-CODE

Related Questions in PREFIX-TREE

Popular Questions

Trending Questions