OCR: Automated Text Recognition from Images

6 mins read428 Views Comment

Senior Manager Content

Updated on Apr 30, 2023 15:40 IST

OCR (Optical Character Recognition) is a technology that enables the extraction of text from images. It has various applications such as digitizing printed documents, extracting text from images for translation or search, and assisting visually impaired individuals. In this project, we will develop an OCR system that can automatically recognize text from images. We will use machine learning techniques to train a model to recognize characters and convert them into machine-readable text. Our goal is to create an efficient and accurate OCR system that can handle a wide range of input images and produce accurate results.

The objective of this project is to develop a program that can automatically recognize and extract text from images using computer vision and machine learning techniques.

Prequisite for Optical Character Recognition:

Programming skills in Python
Basic image processing knowledge (resizing, binarization, noise removal)
Knowledge of machine learning and deep learning concepts
Understanding of linear algebra
Knowledge of data structures and algorithms
Familiarity with NLP concepts

Recommended online courses

Best-suited Data Science courses for you

Learn Data Science with these high-rated online courses

Discontinued (July 2024)- Post Graduate Program in Business Analytics and Intelligence (PGP-BA&I)

Amity OnlineCertificate

Total Fees

– / –

Duration

12 months

Master of Computer Applications with specialization in Machine Learning and Artificial Intelligence (Online MCA)

Amity OnlineDegree

Total Fees

₹1.7 L

Duration

2 years

Certification in Data Science

MIT School of Distance EducationCertificate

Total Fees

₹80 K

Duration

4 months

Post Graduate Diploma in Big Data Science & Big Data Analysis

IIMT AhmedabadDiploma

Total Fees

₹1.18 L

Duration

12 months

MCA in Machine Learning Online

Amity OnlineDegree

Total Fees

₹2.5 L

Duration

2 years

Master of Science (Data Science)

Chandigarh University (CU)Degree

Total Fees

₹90 K

Duration

24 months

MCA in Machine Learning

Amity University Online, NoidaDegree

Total Fees

₹2.5 L

Duration

2 years

Python Certificate

IIT MadrasCertificate

4.4

Total Fees

Free

Duration

4 weeks

PG Diploma in Artificial Intelligence (PG-DAI)

CDAC - Centre for Development of Advanced ComputingDiploma

4.0

Total Fees

₹1.27 L

Duration

6 weeks

Bachelor of Science in Programming and Data science

IIT MadrasDegree

3.5

Total Fees

₹1.24 L

Duration

48 months

Skills you will Learn from this project:

Image processing: OCR requires preprocessing of images, such as resizing, binarization, noise removal, etc. These techniques are useful in many other computer vision projects.
Machine learning: OCR systems use machine learning models to recognize text in images. Working on such a project can give you experience in building, training, and deploying machine learning models.
Programming: OCR projects typically require Python programming skills in languages. Developing OCR algorithms and integrating them with other software requires proficiency in programming.
Problem-solving: OCR projects present many challenges such as handling different font styles, sizes, text orientation, and background noise. Addressing these challenges requires creative problem-solving skills.
Attention to detail: Since OCR involves recognizing individual characters in images, it requires a high level of attention to detail and precision.
Collaboration: OCR projects often require collaboration with other experts in fields such as image processing, machine learning, and software development.

OCR Project Description:

The project will involve designing and implementing an automated text recognition program that can accurately detect and extract text from a variety of images. The program will use techniques such as image preprocessing, feature extraction, and machine learning algorithms to achieve this task. The final product will be a software tool that can be used for a variety of applications, such as digitizing text from documents, extracting information from receipts, and automatically recognizing text in images for accessibility purposes.

Step 1: Set up the environment

First, you need to set up the environment by installing the required modules. You can do this by running the following code in a code cell:

!pip install opencv-python-headless matplotlib easyocr
Copy code

This will install the required modules: opencv-python-headless for image processing, matplotlib for visualization, and easyocr for text recognition.

Step 2: Load the image

Next, you need to load the image into the notebook. You can do this by uploading the image to your Google Drive and then mounting your Google Drive in the notebook. You can then load the image using the file path. Here’s the code to mount your Google Drive:

from google.colab import drive
drive.mount('/content/drive')
Copy code

This will prompt you to enter an authorization code, which you can obtain by following the instructions.

Once you’ve mounted your Google Drive, you can load the image using the following code. Do remember to upload a sample image and use its path.

import cv2
import matplotlib.pyplot as plt

# Load the image. 
image = cv2.imread('/content/drive/MyDrive/Datasets/OCR_Detection/Test_OCR.jpg')

# Display the image
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
plt.show()
Copy code

This code will load the image from the file path ‘/content/drive/MyDrive/Datasets/OCR_Detection/Test_OCR.jpg’ using cv2.imread() and display it using matplotlib.pyplot.imshow().

Step 3: Recognize text in the image

Next, you need to recognize the text in the image using EasyOCR. You can do this by installing and importing EasyOCR and then calling its readtext() method. Here’s the code:

Import the EasyOCR library, which is a Python wrapper for the OCR engine that uses deep learning algorithms to recognize text in images.

import easyocr
Copy code

Create an OCR reader object using the EasyOCR library. It specifies the language list to be used for OCR as English (‘en’) and sets GPU usage to True.

# Initialize the OCR reader
reader = easyocr.Reader(lang_list=['en'], gpu=True)
Copy code

# Read the text in the image
results = reader.readtext(image)
Copy code

Above line reads the text from the input image using the OCR reader.
Stores the results in a list called results.
Each element in the results list is a tuple containing the bounding box coordinates, recognized text, and confidence score for each detected word or line of text in the image.

# Print the text
for result in results:
    print(result[1])
Copy code

Output:

shiksha
online
loptical Character Recognition

This block of code iterates over the results list using a for loop.
For each tuple in the list, it prints the second element of the tuple, which is the recognized text, to the console.

This code will initialize the OCR reader with English as the language and GPU acceleration enabled, and then recognize the text in the image using reader.readtext(). It will then loop through the results and print the recognized text.

Step 4: Draw bounding boxes around the text

Finally, you can draw bounding boxes around the text in the image using OpenCV. You can do this by looping through the results returned by EasyOCR and using OpenCV’s cv2.rectangle() method to draw a rectangle around each text region. Here’s the code:

# Initialize the OCR reader
reader = easyocr.Reader(lang_list=['en'], gpu=True)
Copy code

This line creates an OCR reader object using the EasyOCR library.
It specifies the language list to be used for OCR as English (‘en’) and sets GPU usage to True.

# Read the text in the image
try:
    results = reader.readtext(image)
except Exception as e:
    print(f"An error occurred while reading the text: {e}")
    results = []
Copy code

This block of code reads the text from the input image using the OCR reader.
It wraps the OCR operation in a try-except block to handle any exceptions that might occur during OCR, such as if the input image is not valid.
If an exception occurs, it prints an error message and sets the results variable to an empty list.

# Draw bounding boxes around the text and print the coordinates
for bbox, text, score in results:
    cv2.rectangle(image, bbox[0], bbox[2], (255, 0, 0), 5)
    print(f"BBox: {bbox}")  

    # Add text over the bounding box
    cv2.putText(image, text, (bbox[0][0], bbox[0][1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 2)
Copy code

Output:

BBox: [[542, 40], [708, 40], [708, 90], [542, 90]]
BBox: [[543, 78], [688, 78], [688, 126], [543, 126]]
BBox: [[0, 505], [1031, 505], [1031, 618], [0, 618]]

This loop iterates over the results of the OCR operation, which is a list of tuples containing the bounding box coordinates, text, and confidence score for each detected word or line of text.
For each tuple, the code draws a rectangle around the text in the input image using the OpenCV cv2.rectangle method.
It also prints the bounding box coordinates to the console.
Additionally, it adds the OCR text over the bounding box using the cv2.putText method.

# Display the image with the bounding boxes
img = plt.imshow(image)
plt.savefig('/content/drive/MyDrive/Datasets/OCR_Detection/image_with_bboxes.png',dpi=100, bbox_inches='tight')
plt.show()
Copy code

Output:

At the end display the input image with the overlaid bounding boxes and OCR text using the plt.imshow and plt.show methods from the Matplotlib library.

Future Scope of OCR Project: DIY

Once the automated text recognition program is developed, there are several potential future applications and enhancements that could be considered. Some of these include:Click to check what further you can add to this code as Improvements/ Features for this for OCR Project

Feature	Description	Logic
Multi-language support	Add support for recognizing text in languages other than English	1. Train the OCR algorithm on datasets of different languages 2. Use pre-trained models that have been specifically designed for multi-lingual recognition
Handwritten text recognition	Add support for recognizing handwritten text	1. Train the OCR algorithm on datasets of handwritten text 2. Use pre-trained models that have been specifically designed for handwriting recognition
Table recognition	Add support for recognizing and extracting information from tables	1. Use computer vision techniques to identify table structures in documents 2. Use OCR to recognize and extract text from cells in the table
Improved accuracy	Improve the accuracy of the OCR program	1. Experiment with different algorithms, feature extraction techniques, or deep learning architectures 2. Use larger or more diverse training datasets to train the OCR algorithm
Real-time recognition	Add support for real-time recognition of text	1. Optimize the OCR algorithm for speed 2. Integrate the OCR program with technologies such as live video streams or camera feeds

About the Author

Atul Harsha

Senior Manager Content

Experienced AI and Machine Learning content creator with a passion for using data to solve real-world challenges. I specialize in Python, SQL, NLP, and Data Visualization. My goal is to make data science engaging an... Read Full Bio

OCR: Automated Text Recognition from Images

Prequisite for Optical Character Recognition:

Best-suited Data Science courses for you

Discontinued (July 2024)- Post Graduate Program in Business Analytics and Intelligence (PGP-BA&I)

Master of Computer Applications with specialization in Machine Learning and Artificial Intelligence (Online MCA)

Certification in Data Science

Post Graduate Diploma in Big Data Science & Big Data Analysis

MCA in Machine Learning Online

Master of Science (Data Science)

MCA in Machine Learning

Python Certificate

PG Diploma in Artificial Intelligence (PG-DAI)

Bachelor of Science in Programming and Data science

Skills you will Learn from this project:

OCR Project Description:

Step 1: Set up the environment

Step 2: Load the image

Step 3: Recognize text in the image

Step 4: Draw bounding boxes around the text

Future Scope of OCR Project: DIY

Top Picks & New Arrivals