OCR: Automated Text Recognition from Images

OCR: Automated Text Recognition from Images

6 mins read428 Views Comment
Atul
Atul Harsha
Senior Manager Content
Updated on Apr 30, 2023 15:40 IST

OCR (Optical Character Recognition) is a technology that enables the extraction of text from images. It has various applications such as digitizing printed documents, extracting text from images for translation or search, and assisting visually impaired individuals. In this project, we will develop an OCR system that can automatically recognize text from images. We will use machine learning techniques to train a model to recognize characters and convert them into machine-readable text. Our goal is to create an efficient and accurate OCR system that can handle a wide range of input images and produce accurate results.

2023_04_Untitled-design.jpg

The objective of this project is to develop a program that can automatically recognize and extract text from images using computer vision and machine learning techniques.

Prequisite for Optical Character Recognition:

  • Programming skills in Python
  • Basic image processing knowledge (resizing, binarization, noise removal)
  • Knowledge of machine learning and deep learning concepts
  • Understanding of linear algebra
  • Knowledge of data structures and algorithms
  • Familiarity with NLP concepts
Recommended online courses

Best-suited Data Science courses for you

Learn Data Science with these high-rated online courses

80 K
4 months
1.18 L
12 months
2.5 L
2 years
90 K
24 months
2.5 L
2 years
Free
4 weeks
1.24 L
48 months

Skills you will Learn from this project:

  • Image processing: OCR requires preprocessing of images, such as resizing, binarization, noise removal, etc. These techniques are useful in many other computer vision projects.
  • Machine learning: OCR systems use machine learning models to recognize text in images. Working on such a project can give you experience in building, training, and deploying machine learning models.
  • Programming: OCR projects typically require Python programming skills in languages. Developing OCR algorithms and integrating them with other software requires proficiency in programming.
  • Problem-solving: OCR projects present many challenges such as handling different font styles, sizes, text orientation, and background noise. Addressing these challenges requires creative problem-solving skills.
  • Attention to detail: Since OCR involves recognizing individual characters in images, it requires a high level of attention to detail and precision.
  • Collaboration: OCR projects often require collaboration with other experts in fields such as image processing, machine learning, and software development.

OCR Project Description:

The project will involve designing and implementing an automated text recognition program that can accurately detect and extract text from a variety of images. The program will use techniques such as image preprocessing, feature extraction, and machine learning algorithms to achieve this task. The final product will be a software tool that can be used for a variety of applications, such as digitizing text from documents, extracting information from receipts, and automatically recognizing text in images for accessibility purposes.

Step 1: Set up the environment

First, you need to set up the environment by installing the required modules. You can do this by running the following code in a code cell:

 
!pip install opencv-python-headless matplotlib easyocr
Copy code

This will install the required modules: opencv-python-headless for image processing, matplotlib for visualization, and easyocr for text recognition.

Step 2: Load the image

Next, you need to load the image into the notebook. You can do this by uploading the image to your Google Drive and then mounting your Google Drive in the notebook. You can then load the image using the file path. Here’s the code to mount your Google Drive:

 
from google.colab import drive
drive.mount('/content/drive')
Copy code

This will prompt you to enter an authorization code, which you can obtain by following the instructions.

Once you’ve mounted your Google Drive, you can load the image using the following code. Do remember to upload a sample image and use its path.

 
import cv2
import matplotlib.pyplot as plt
# Load the image.
image = cv2.imread('/content/drive/MyDrive/Datasets/OCR_Detection/Test_OCR.jpg')
# Display the image
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
plt.show()
Copy code
2023_04_OCR.jpg

This code will load the image from the file path ‘/content/drive/MyDrive/Datasets/OCR_Detection/Test_OCR.jpg’ using cv2.imread() and display it using matplotlib.pyplot.imshow().

Step 3: Recognize text in the image

Next, you need to recognize the text in the image using EasyOCR. You can do this by installing and importing EasyOCR and then calling its readtext() method. Here’s the code:

Import the EasyOCR library, which is a Python wrapper for the OCR engine that uses deep learning algorithms to recognize text in images.

 
import easyocr
Copy code

Create an OCR reader object using the EasyOCR library. It specifies the language list to be used for OCR as English (‘en’) and sets GPU usage to True.

 
# Initialize the OCR reader
reader = easyocr.Reader(lang_list=['en'], gpu=True)
Copy code
 
# Read the text in the image
results = reader.readtext(image)
Copy code
  • Above line reads the text from the input image using the OCR reader.
  • Stores the results in a list called results.
  • Each element in the results list is a tuple containing the bounding box coordinates, recognized text, and confidence score for each detected word or line of text in the image.
 
# Print the text
for result in results:
print(result[1])
Copy code
Output:

shiksha
online
loptical Character Recognition
  • This block of code iterates over the results list using a for loop.
  • For each tuple in the list, it prints the second element of the tuple, which is the recognized text, to the console.

This code will initialize the OCR reader with English as the language and GPU acceleration enabled, and then recognize the text in the image using reader.readtext(). It will then loop through the results and print the recognized text.

Step 4: Draw bounding boxes around the text

Finally, you can draw bounding boxes around the text in the image using OpenCV. You can do this by looping through the results returned by EasyOCR and using OpenCV’s cv2.rectangle() method to draw a rectangle around each text region. Here’s the code:

 
# Initialize the OCR reader
reader = easyocr.Reader(lang_list=['en'], gpu=True)
Copy code
  • This line creates an OCR reader object using the EasyOCR library.
  • It specifies the language list to be used for OCR as English (‘en’) and sets GPU usage to True.
 
# Read the text in the image
try:
results = reader.readtext(image)
except Exception as e:
print(f"An error occurred while reading the text: {e}")
results = []
Copy code
  • This block of code reads the text from the input image using the OCR reader.
  • It wraps the OCR operation in a try-except block to handle any exceptions that might occur during OCR, such as if the input image is not valid.
  • If an exception occurs, it prints an error message and sets the results variable to an empty list.
 
# Draw bounding boxes around the text and print the coordinates
for bbox, text, score in results:
cv2.rectangle(image, bbox[0], bbox[2], (255, 0, 0), 5)
print(f"BBox: {bbox}")
# Add text over the bounding box
cv2.putText(image, text, (bbox[0][0], bbox[0][1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 2)
Copy code
Output:

BBox: [[542, 40], [708, 40], [708, 90], [542, 90]]
BBox: [[543, 78], [688, 78], [688, 126], [543, 126]]
BBox: [[0, 505], [1031, 505], [1031, 618], [0, 618]]
  • This loop iterates over the results of the OCR operation, which is a list of tuples containing the bounding box coordinates, text, and confidence score for each detected word or line of text.
  • For each tuple, the code draws a rectangle around the text in the input image using the OpenCV cv2.rectangle method.
  • It also prints the bounding box coordinates to the console.
  • Additionally, it adds the OCR text over the bounding box using the cv2.putText method.
 
# Display the image with the bounding boxes
img = plt.imshow(image)
plt.savefig('/content/drive/MyDrive/Datasets/OCR_Detection/image_with_bboxes.png',dpi=100, bbox_inches='tight')
plt.show()
Copy code

Output:

2023_04_OCR_BBox.jpg

At the end display the input image with the overlaid bounding boxes and OCR text using the plt.imshow and plt.show methods from the Matplotlib library.

Future Scope of OCR Project: DIY

Once the automated text recognition program is developed, there are several potential future applications and enhancements that could be considered. Some of these include:Click to check what further you can add to this code as Improvements/ Features for this for OCR Project

Feature Description Logic
Multi-language support Add support for recognizing text in languages other than English 1. Train the OCR algorithm on datasets of different languages
2. Use pre-trained models that have been specifically designed for multi-lingual recognition
Handwritten text recognition Add support for recognizing handwritten text 1. Train the OCR algorithm on datasets of handwritten text
2. Use pre-trained models that have been specifically designed for handwriting recognition
Table recognition Add support for recognizing and extracting information from tables 1. Use computer vision techniques to identify table structures in documents
2. Use OCR to recognize and extract text from cells in the table
Improved accuracy Improve the accuracy of the OCR program 1. Experiment with different algorithms, feature extraction techniques, or deep learning architectures
2. Use larger or more diverse training datasets to train the OCR algorithm
Real-time recognition Add support for real-time recognition of text 1. Optimize the OCR algorithm for speed
2. Integrate the OCR program with technologies such as live video streams or camera feeds

About the Author
author-image
Atul Harsha
Senior Manager Content

Experienced AI and Machine Learning content creator with a passion for using data to solve real-world challenges. I specialize in Python, SQL, NLP, and Data Visualization. My goal is to make data science engaging an... Read Full Bio