Skip to content

aguscasaletti/huffman-file-compression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Huffman compression

Implementation of the Huffman algorithm in Java (IntelliJ).

  • Stores the compressed file with a made up ".u21" extension, with the following inner structure:
<'huf' prefix> <filename length> <filename> <encoded message (raw bytes)> <symbols table>

The huffman symbols table is a map that holds key-value pairs like so:

'a' --> '001'  ;  'b' --> '1011'

Where 'a' and 'b' are symbols obtained from the input text and '001' and '1011' are the codes generated by the huffman tree.

The symbols table has its own inner structure:

<symbols count>  <char (16-bit each) symbols ...> <codes lengths (for each of the symbol's code)...> <raw bytes of all codes>

i.e. if the symbols table is "'a' --> '001'  ;  'b' --> '1011'", then the serialization would be:
   <2> <'a', 'b'> <3, 4> <0011011> (adding padding bits)

Note: This implementation is far from perfect or optimized, and it can't compress binary files, only plain text files. It can however compress large text based files +6MB in a reasonable time at a 0.6≈ compression rate.

Releases

No releases published

Packages

No packages published

Languages