A Huffman compressor is a compressor based on the Huffman Codification. This algorithm examines the text counting the frequences of each character. Then, it makes a new codification using that frequences. For example, in order to reduce the space, it`s easy to see that if we store 'C' letter 5 times as 0, 'A' letter 2 times as 10 and 'B' letter 3 times as 11 we achieve reducing the size compared with binary codification:
Huffman codification | Binary codification | |
Text | C C C A B C A B C | C C C A B C A B C |
Codification | 0 0 0 10 11 0 10 11 0 | 10 10 10 00 01 10 00 01 |
Total symbols | 13 | 18 |
Note that the characters with greater frequency have a codification with less symbols than the characters with lesser frequency. That's the key. As you can guess, the way to make a codification is building a binary tree, each character is a leaf node that is placed in n-level based on its frequency.
Following this idea I have developed a simple application to demonstrate how we can reduce the size of any text, although the implementation is quite more difficult. Furthermore, We do not forget we have a custom codification for each text compressed so we need to store that codification too. You can get more information about Huffman codification here.
Huffman Compressor application is able to read a file text, to make a Huffman codification and to compress the text. Also, for academic purposes, the text can be compressed in separately blocks.
First, download the file here and then, follow the next instructions:
Step 1. Filling the form. |
Step 2. Making a codification. |
Step 3. Compressing. |
Step 4. Watching results. |
Step 5. Decompressing. |