Introduction to Huffman Coding

Introduction to Huffman Coding

4 mins read1.4K Views Comment
Updated on Feb 9, 2024 17:06 IST

Huffman coding is a greedy approach based lossless data compression algorithm. It uses variable-length encoding to compress the data. The main idea in Huffman coding is to assign each character with a variable length code. The length of each character is decided on the basis of its occurrence frequency. The character that occurred most time would have the smallest code and the character that occurred least in the data would have the largest code.

2022_10_4.jpg

In this article, we are going to discuss Huffman coding, prefix rule, major steps in building a Huffman tree, and algorithm for Huffman coding, followed by the complexity and application of Huffman coding. 

Huffman coding

Huffman coding was introduced by David Huffman. It is a lossless data compression methodology used before sending the data so that data can be compressed and sent using minimal bits, without redundancy, and without losing any of the details. It generally compresses the most frequently occurring characters. 

The characters which have more frequency are assigned with the smallest code and the characters with larger frequency get assigned with the largest code. 

The codes are assigned in such a manner that each code will be unique and different for each character.

Also explore: Understanding Data Structures in C: Types And Operations

Recommended online courses

Best-suited Data Structures and Algorithms courses for you

Learn Data Structures and Algorithms with these high-rated online courses

– / –
4 months
– / –
16 weeks
Free
– / –
– / –
– / –
– / –
150 hours
– / –
6 months
– / –
4 months
– / –
8 weeks
β‚Ή4.24 K
6 weeks
– / –
12 weeks

Prefix rule

Huffman coding is implemented using the prefix rule. The variable-length codes assigned to the characters based on their frequency are the Prefix code. Prefix codes are used to remove ambiguity during the decoding of the data, no two characters can have the same prefix code. 

Major steps in building a Huffman tree

Huffman coding technique involves two major steps, which are as follows:

  • Creating a Huffman tree of the input characters.
  • Traversing the tree and assigning codes to the characters. 

You can also explore: Tree Traversal in Data Structure

Explore: Data Structures Online Courses & Certifications

Working or Creation of Huffman Tree

Huffman Coding step-by-step working or creation of Huffman Tree is as follows:

  • Step-1: Calculate the frequency of each string.
  • Step-2: Sort all the characters on the basis of their frequency in ascending order.
  • Step-3: Mark each unique character as a leaf node.
  • Step-4: Create a new internal node. 
  • Step-5: The frequency of the new node as the sum of the single leaf node
  • Step-6: Mark the first node as this left child and another node as the right child of the recently created node. 
  • Step-7: Repeat all the steps from step-2 to step-6.

You can also explore: Introduction to Linked List Data Structure

Formulas Used in Huffman Tree

Average code length per character = βˆ‘(frequencyi x code lengthi)/ βˆ‘ frequencyi 

Total number of bits in Huffman encoded message 

                     = Total number of characters in the message x Average code length per character

                     = βˆ‘ ( frequencyi x Code lengthi )

Example: 

Suppose a data file has the following characters and the frequencies. If huffman coding is used, calculate:

  • Huffman Code of each character
  • Average code length
  • Length of Huffman encoded data 
Characters Frequencies
A 12
B 15
C 7
D 13
E 9

Solution:

Initially, create the Huffman Tree:

The image shows Huffman tree.

Step-2:

The image shows Huffman coding.

Step-3:

Step-4:

The image shows Huffman coding.

Step-5:

The image shows Huffman tree.

The above tree is a Huffman Tree.

Now, assign weight to all the nodes. 

Assign β€œ0” to all left edges and β€œ1” to all right edges. 

The tree will become

Huffman Code of each character: 

A: 00 

B: 10

C: 110

D: 01

E: 111

Average code length per character = βˆ‘(frequencyi x code lengthi)/ βˆ‘ frequencyi 

                                                            = {(12 x 2) + (13 x 2)+ (15 x 2)+ ( 7 x 3)+ (9 x 3)} / (12 + 13 + 15 + 7+ 9) 

                                                            = (24+ 26+ 30 + 21 + 27)/56

                                                            = 128/56

                                                            = 2.28

Average code length per character = 2.28

Total number of bits in Huffman encoded message 

                     = Total number of characters in the message x Average code length per character

                     =  56 x 2.28

                     =  127.68

 

Algorithm for Huffman coding

For implementing Huffman coding, we need a priority queue that will have all the unique characters of the data. Then sort all the characters on the basis of their occurring frequency in ascending order. Take out the minimum value from the priority queue and assign it to the left child and similarly assign the value to the right child. 

The algorithm for Huffman coding is as follows:

create a priority queue X having each unique character.
sort them in ascending order based on their frequencies.
for ( all the unique characters):
    create a newNode
    extract minimum value from X and assign it to the left child of newNode
    extract minimum value from X and assign it to the right child of newNode
    find the sum of these two minimum values and assign it to the value of newNode
    insert this newNode into the tree
return rootNode

Complexity 

Time complexity is O (n logn).

Application of Huffman coding. 

  • Huffman coding is used in traditional compression techniques like GZIP etc. 
  • For tax and fax transmissions

FAQs

Is Huffman coding lossless or lossy compression?

Huffman coding is a lossless compression algorithm. It preserves all the original information from the input data during the compression and decompression process. When the compressed data is decompressed using the same Huffman tree, the original data is fully recovered without any loss.

What is the time complexity of the Huffman coding algorithm?

The time complexity of the Huffman coding algorithm is O(n log n), where n is the number of unique characters or symbols in the input data. This complexity arises from building the Huffman tree by repeatedly merging nodes and constructing the variable-length codes.

Is Huffman coding used in practice?

Yes, Huffman coding is widely used in practice. It has been used in various applications, including file compression formats such as ZIP, GZIP, and DEFLATE. It is also used in image compression algorithms like JPEG and video compression algorithms like MPEG.

About the Author

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio