Introduction to Huffman Coding

4 mins read1.4K Views Comment

Updated on Feb 9, 2024 17:06 IST

Huffman coding is a greedy approach based lossless data compression algorithm. It uses variable-length encoding to compress the data. The main idea in Huffman coding is to assign each character with a variable length code. The length of each character is decided on the basis of its occurrence frequency. The character that occurred most time would have the smallest code and the character that occurred least in the data would have the largest code.

In this article, we are going to discuss Huffman coding, prefix rule, major steps in building a Huffman tree, and algorithm for Huffman coding, followed by the complexity and application of Huffman coding.

Huffman coding

Huffman coding was introduced by David Huffman. It is a lossless data compression methodology used before sending the data so that data can be compressed and sent using minimal bits, without redundancy, and without losing any of the details. It generally compresses the most frequently occurring characters.

The characters which have more frequency are assigned with the smallest code and the characters with larger frequency get assigned with the largest code.

The codes are assigned in such a manner that each code will be unique and different for each character.

Also explore: Understanding Data Structures in C: Types And Operations

Recommended online courses

Best-suited Data Structures and Algorithms courses for you

Learn Data Structures and Algorithms with these high-rated online courses

Python Foundation with Data Structures & Algorithms

Coding NinjasCertificate

4.6

Total Fees

– / –

Duration

4 months

Foundation of Data Structures

IIT BombayCertificate

5.0

Total Fees

– / –

Duration

16 weeks

Parallel Algorithm

IIT KanpurCertificate

Total Fees

Free

Duration

– / –

Programming and Data Structures

IIT KharagpurCertificate

4.4

Total Fees

– / –

Duration

– / –

DSA to Development A Complete Guide

GeeksforGeeksCertificate

Total Fees

– / –

Duration

150 hours

Data Structures & Algorithm

Coding NinjasCertificate

5.0

Total Fees

– / –

Duration

6 months

Foundation for Competitive Programming

Coding NinjasCertificate

5.0

Total Fees

– / –

Duration

4 months

Numerical Methods

NPTELCertificate

Total Fees

– / –

Duration

8 weeks

Programming for Problem Solving (KCS101/KCS201) (M)

IIT KanpurCertificate

Total Fees

₹4.24 K

Duration

6 weeks

Data Structures

Online Cell- Centre for Distance EducationCertificate

Total Fees

– / –

Duration

12 weeks

Prefix rule

Huffman coding is implemented using the prefix rule. The variable-length codes assigned to the characters based on their frequency are the Prefix code. Prefix codes are used to remove ambiguity during the decoding of the data, no two characters can have the same prefix code.

Major steps in building a Huffman tree

Huffman coding technique involves two major steps, which are as follows:

Creating a Huffman tree of the input characters.
Traversing the tree and assigning codes to the characters.

You can also explore: Tree Traversal in Data Structure

Explore: Data Structures Online Courses & Certifications

Working or Creation of Huffman Tree

Huffman Coding step-by-step working or creation of Huffman Tree is as follows:

Step-1: Calculate the frequency of each string.
Step-2: Sort all the characters on the basis of their frequency in ascending order.
Step-3: Mark each unique character as a leaf node.
Step-4: Create a new internal node.
Step-5: The frequency of the new node as the sum of the single leaf node
Step-6: Mark the first node as this left child and another node as the right child of the recently created node.
Step-7: Repeat all the steps from step-2 to step-6.

You can also explore: Introduction to Linked List Data Structure

Formulas Used in Huffman Tree

Average code length per character = ∑(frequencyi x code lengthi)/ ∑ frequencyi

Total number of bits in Huffman encoded message

= Total number of characters in the message x Average code length per character

= ∑ ( frequency_i x Code length_i)

Example:

Suppose a data file has the following characters and the frequencies. If huffman coding is used, calculate:

Huffman Code of each character
Average code length
Length of Huffman encoded data

Characters	Frequencies
A	12
B	15
C	7
D	13
E	9

Solution:

Initially, create the Huffman Tree:

Step-2:

Step-3:

Step-4:

Step-5:

The above tree is a Huffman Tree.

Now, assign weight to all the nodes.

Assign “0” to all left edges and “1” to all right edges.

The tree will become

Huffman Code of each character:

A: 00

B: 10

C: 110

D: 01

E: 111

Average code length per character = ∑(frequencyi x code lengthi)/ ∑ frequencyi

= {(12 x 2) + (13 x 2)+ (15 x 2)+ ( 7 x 3)+ (9 x 3)} / (12 + 13 + 15 + 7+ 9)

= (24+ 26+ 30 + 21 + 27)/56

= 128/56

= 2.28

Average code length per character = 2.28

Total number of bits in Huffman encoded message

= Total number of characters in the message x Average code length per character

= 56 x 2.28

= 127.68

Algorithm for Huffman coding

For implementing Huffman coding, we need a priority queue that will have all the unique characters of the data. Then sort all the characters on the basis of their occurring frequency in ascending order. Take out the minimum value from the priority queue and assign it to the left child and similarly assign the value to the right child.

The algorithm for Huffman coding is as follows:

create a priority queue X having each unique character.

sort them in ascending order based on their frequencies.

for ( all the unique characters):

    create a newNode

    extract minimum value from X and assign it to the left child of newNode

    extract minimum value from X and assign it to the right child of newNode

    find the sum of these two minimum values and assign it to the value of newNode

    insert this newNode into the tree

return rootNode

Complexity

Time complexity is O (n logn).

Application of Huffman coding.

Huffman coding is used in traditional compression techniques like GZIP etc.
For tax and fax transmissions

FAQs

Is Huffman coding lossless or lossy compression?

Huffman coding is a lossless compression algorithm. It preserves all the original information from the input data during the compression and decompression process. When the compressed data is decompressed using the same Huffman tree, the original data is fully recovered without any loss.

What is the time complexity of the Huffman coding algorithm?

The time complexity of the Huffman coding algorithm is O(n log n), where n is the number of unique characters or symbols in the input data. This complexity arises from building the Huffman tree by repeatedly merging nodes and constructing the variable-length codes.

Is Huffman coding used in practice?

Yes, Huffman coding is widely used in practice. It has been used in various applications, including file compression formats such as ZIP, GZIP, and DEFLATE. It is also used in image compression algorithms like JPEG and video compression algorithms like MPEG.

About the Author

Shiksha Online

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio

Introduction to Huffman Coding

Huffman coding

Best-suited Data Structures and Algorithms courses for you

Python Foundation with Data Structures & Algorithms

Foundation of Data Structures

Parallel Algorithm

Programming and Data Structures

DSA to Development A Complete Guide

Data Structures & Algorithm

Foundation for Competitive Programming

Numerical Methods

Programming for Problem Solving (KCS101/KCS201) (M)

Data Structures

Prefix rule

Major steps in building a Huffman tree

Working or Creation of Huffman Tree

Formulas Used in Huffman Tree

Example:

Algorithm for Huffman coding

Complexity

Application of Huffman coding.

FAQs

Top Picks & New Arrivals