OCR: Automated Text Recognition from Images
OCR (Optical Character Recognition) is a technology that enables the extraction of text from images. It has various applications such as digitizing printed documents, extracting text from images for translation or search, and assisting visually impaired individuals. In this project, we will develop an OCR system that can automatically recognize text from images. We will use machine learning techniques to train a model to recognize characters and convert them into machine-readable text. Our goal is to create an efficient and accurate OCR system that can handle a wide range of input images and produce accurate results.
The objective of this project is to develop a program that can automatically recognize and extract text from images using computer vision and machine learning techniques.
Prequisite for Optical Character Recognition:
- Programming skills in Python
- Basic image processing knowledge (resizing, binarization, noise removal)
- Knowledge of machine learning and deep learning concepts
- Understanding of linear algebra
- Knowledge of data structures and algorithms
- Familiarity with NLP concepts
Best-suited Data Science courses for you
Learn Data Science with these high-rated online courses
Skills you will Learn from this project:
- Image processing: OCR requires preprocessing of images, such as resizing, binarization, noise removal, etc. These techniques are useful in many other computer vision projects.
- Machine learning: OCR systems use machine learning models to recognize text in images. Working on such a project can give you experience in building, training, and deploying machine learning models.
- Programming: OCR projects typically require Python programming skills in languages. Developing OCR algorithms and integrating them with other software requires proficiency in programming.
- Problem-solving: OCR projects present many challenges such as handling different font styles, sizes, text orientation, and background noise. Addressing these challenges requires creative problem-solving skills.
- Attention to detail: Since OCR involves recognizing individual characters in images, it requires a high level of attention to detail and precision.
- Collaboration: OCR projects often require collaboration with other experts in fields such as image processing, machine learning, and software development.
OCR Project Description:
The project will involve designing and implementing an automated text recognition program that can accurately detect and extract text from a variety of images. The program will use techniques such as image preprocessing, feature extraction, and machine learning algorithms to achieve this task. The final product will be a software tool that can be used for a variety of applications, such as digitizing text from documents, extracting information from receipts, and automatically recognizing text in images for accessibility purposes.
Step 1: Set up the environment
First, you need to set up the environment by installing the required modules. You can do this by running the following code in a code cell:
!pip install opencv-python-headless matplotlib easyocr
This will install the required modules: opencv-python-headless for image processing, matplotlib for visualization, and easyocr for text recognition.
Step 2: Load the image
Next, you need to load the image into the notebook. You can do this by uploading the image to your Google Drive and then mounting your Google Drive in the notebook. You can then load the image using the file path. Here’s the code to mount your Google Drive:
from google.colab import drivedrive.mount('/content/drive')
This will prompt you to enter an authorization code, which you can obtain by following the instructions.
Once you’ve mounted your Google Drive, you can load the image using the following code. Do remember to upload a sample image and use its path.
import cv2import matplotlib.pyplot as plt
# Load the image. image = cv2.imread('/content/drive/MyDrive/Datasets/OCR_Detection/Test_OCR.jpg')
# Display the imageplt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))plt.show()
This code will load the image from the file path ‘/content/drive/MyDrive/Datasets/OCR_Detection/Test_OCR.jpg’ using cv2.imread() and display it using matplotlib.pyplot.imshow().
Step 3: Recognize text in the image
Next, you need to recognize the text in the image using EasyOCR. You can do this by installing and importing EasyOCR and then calling its readtext() method. Here’s the code:
Import the EasyOCR library, which is a Python wrapper for the OCR engine that uses deep learning algorithms to recognize text in images.
import easyocr
Create an OCR reader object using the EasyOCR library. It specifies the language list to be used for OCR as English (‘en’) and sets GPU usage to True.
# Initialize the OCR readerreader = easyocr.Reader(lang_list=['en'], gpu=True)
# Read the text in the imageresults = reader.readtext(image)
- Above line reads the text from the input image using the OCR reader.
- Stores the results in a list called results.
- Each element in the results list is a tuple containing the bounding box coordinates, recognized text, and confidence score for each detected word or line of text in the image.
# Print the textfor result in results: print(result[1])
Output: shiksha online loptical Character Recognition
- This block of code iterates over the results list using a for loop.
- For each tuple in the list, it prints the second element of the tuple, which is the recognized text, to the console.
This code will initialize the OCR reader with English as the language and GPU acceleration enabled, and then recognize the text in the image using reader.readtext(). It will then loop through the results and print the recognized text.
Step 4: Draw bounding boxes around the text
Finally, you can draw bounding boxes around the text in the image using OpenCV. You can do this by looping through the results returned by EasyOCR and using OpenCV’s cv2.rectangle() method to draw a rectangle around each text region. Here’s the code:
# Initialize the OCR readerreader = easyocr.Reader(lang_list=['en'], gpu=True)
- This line creates an OCR reader object using the EasyOCR library.
- It specifies the language list to be used for OCR as English (‘en’) and sets GPU usage to True.
# Read the text in the imagetry: results = reader.readtext(image)except Exception as e: print(f"An error occurred while reading the text: {e}") results = []
- This block of code reads the text from the input image using the OCR reader.
- It wraps the OCR operation in a try-except block to handle any exceptions that might occur during OCR, such as if the input image is not valid.
- If an exception occurs, it prints an error message and sets the results variable to an empty list.
# Draw bounding boxes around the text and print the coordinatesfor bbox, text, score in results: cv2.rectangle(image, bbox[0], bbox[2], (255, 0, 0), 5) print(f"BBox: {bbox}")
# Add text over the bounding box cv2.putText(image, text, (bbox[0][0], bbox[0][1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 2)
Output: BBox: [[542, 40], [708, 40], [708, 90], [542, 90]] BBox: [[543, 78], [688, 78], [688, 126], [543, 126]] BBox: [[0, 505], [1031, 505], [1031, 618], [0, 618]]
- This loop iterates over the results of the OCR operation, which is a list of tuples containing the bounding box coordinates, text, and confidence score for each detected word or line of text.
- For each tuple, the code draws a rectangle around the text in the input image using the OpenCV cv2.rectangle method.
- It also prints the bounding box coordinates to the console.
- Additionally, it adds the OCR text over the bounding box using the cv2.putText method.
# Display the image with the bounding boxesimg = plt.imshow(image)plt.savefig('/content/drive/MyDrive/Datasets/OCR_Detection/image_with_bboxes.png',dpi=100, bbox_inches='tight')plt.show()
Output:
At the end display the input image with the overlaid bounding boxes and OCR text using the plt.imshow and plt.show methods from the Matplotlib library.
Future Scope of OCR Project: DIY
Once the automated text recognition program is developed, there are several potential future applications and enhancements that could be considered. Some of these include:Click to check what further you can add to this code as Improvements/ Features for this for OCR Project
Feature | Description | Logic |
---|---|---|
Multi-language support | Add support for recognizing text in languages other than English | 1. Train the OCR algorithm on datasets of different languages 2. Use pre-trained models that have been specifically designed for multi-lingual recognition |
Handwritten text recognition | Add support for recognizing handwritten text | 1. Train the OCR algorithm on datasets of handwritten text 2. Use pre-trained models that have been specifically designed for handwriting recognition |
Table recognition | Add support for recognizing and extracting information from tables | 1. Use computer vision techniques to identify table structures in documents 2. Use OCR to recognize and extract text from cells in the table |
Improved accuracy | Improve the accuracy of the OCR program | 1. Experiment with different algorithms, feature extraction techniques, or deep learning architectures 2. Use larger or more diverse training datasets to train the OCR algorithm |
Real-time recognition | Add support for real-time recognition of text | 1. Optimize the OCR algorithm for speed 2. Integrate the OCR program with technologies such as live video streams or camera feeds |
Experienced AI and Machine Learning content creator with a passion for using data to solve real-world challenges. I specialize in Python, SQL, NLP, and Data Visualization. My goal is to make data science engaging an... Read Full Bio