Introduction to Huffman Coding
Huffman coding is a greedy approach based lossless data compression algorithm. It uses variable-length encoding to compress the data. The main idea in Huffman coding is to assign each character with a variable length code. The length of each character is decided on the basis of its occurrence frequency. The character that occurred most time would have the smallest code and the character that occurred least in the data would have the largest code.
In this article, we are going to discuss Huffman coding, prefix rule, major steps in building a Huffman tree, and algorithm for Huffman coding, followed by the complexity and application of Huffman coding.
Huffman coding
Huffman coding was introduced by David Huffman. It is a lossless data compression methodology used before sending the data so that data can be compressed and sent using minimal bits, without redundancy, and without losing any of the details. It generally compresses the most frequently occurring characters.
The characters which have more frequency are assigned with the smallest code and the characters with larger frequency get assigned with the largest code.
The codes are assigned in such a manner that each code will be unique and different for each character.
Also explore: Understanding Data Structures in C: Types And Operations
Best-suited Data Structures and Algorithms courses for you
Learn Data Structures and Algorithms with these high-rated online courses
Prefix rule
Huffman coding is implemented using the prefix rule. The variable-length codes assigned to the characters based on their frequency are the Prefix code. Prefix codes are used to remove ambiguity during the decoding of the data, no two characters can have the same prefix code.
Major steps in building a Huffman tree
Huffman coding technique involves two major steps, which are as follows:
- Creating a Huffman tree of the input characters.
- Traversing the tree and assigning codes to the characters.
You can also explore: Tree Traversal in Data Structure
Explore: Data Structures Online Courses & Certifications
Working or Creation of Huffman Tree
Huffman Coding step-by-step working or creation of Huffman Tree is as follows:
- Step-1: Calculate the frequency of each string.
- Step-2: Sort all the characters on the basis of their frequency in ascending order.
- Step-3: Mark each unique character as a leaf node.
- Step-4: Create a new internal node.
- Step-5: The frequency of the new node as the sum of the single leaf node
- Step-6: Mark the first node as this left child and another node as the right child of the recently created node.
- Step-7: Repeat all the steps from step-2 to step-6.
You can also explore: Introduction to Linked List Data Structure
Formulas Used in Huffman Tree
Average code length per character = β(frequencyi x code lengthi)/ β frequencyi
Total number of bits in Huffman encoded message
= Total number of characters in the message x Average code length per character
= β ( frequencyi x Code lengthi )
Example:
Suppose a data file has the following characters and the frequencies. If huffman coding is used, calculate:
- Huffman Code of each character
- Average code length
- Length of Huffman encoded data
Characters | Frequencies |
---|---|
A | 12 |
B | 15 |
C | 7 |
D | 13 |
E | 9 |
Solution:
Initially, create the Huffman Tree:
Step-2:
Step-3:
Step-4:
Step-5:
The above tree is a Huffman Tree.
Now, assign weight to all the nodes.
Assign β0β to all left edges and β1β to all right edges.
The tree will become
Huffman Code of each character:
A: 00
B: 10
C: 110
D: 01
E: 111
Average code length per character = β(frequencyi x code lengthi)/ β frequencyi
= {(12 x 2) + (13 x 2)+ (15 x 2)+ ( 7 x 3)+ (9 x 3)} / (12 + 13 + 15 + 7+ 9)
= (24+ 26+ 30 + 21 + 27)/56
= 128/56
= 2.28
Average code length per character = 2.28
Total number of bits in Huffman encoded message
= Total number of characters in the message x Average code length per character
= 56 x 2.28
= 127.68
Algorithm for Huffman coding
For implementing Huffman coding, we need a priority queue that will have all the unique characters of the data. Then sort all the characters on the basis of their occurring frequency in ascending order. Take out the minimum value from the priority queue and assign it to the left child and similarly assign the value to the right child.
The algorithm for Huffman coding is as follows:
create a priority queue X having each unique character.
sort them in ascending order based on their frequencies.
for ( all the unique characters):
create a newNode
extract minimum value from X and assign it to the left child of newNode
extract minimum value from X and assign it to the right child of newNode
find the sum of these two minimum values and assign it to the value of newNode
insert this newNode into the tree
return rootNode
Complexity
Time complexity is O (n logn).
Application of Huffman coding.
- Huffman coding is used in traditional compression techniques like GZIP etc.
- For tax and fax transmissions
FAQs
Is Huffman coding lossless or lossy compression?
Huffman coding is a lossless compression algorithm. It preserves all the original information from the input data during the compression and decompression process. When the compressed data is decompressed using the same Huffman tree, the original data is fully recovered without any loss.
What is the time complexity of the Huffman coding algorithm?
The time complexity of the Huffman coding algorithm is O(n log n), where n is the number of unique characters or symbols in the input data. This complexity arises from building the Huffman tree by repeatedly merging nodes and constructing the variable-length codes.
Is Huffman coding used in practice?
Yes, Huffman coding is widely used in practice. It has been used in various applications, including file compression formats such as ZIP, GZIP, and DEFLATE. It is also used in image compression algorithms like JPEG and video compression algorithms like MPEG.
This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio