Python OCR Text Scanner with Text To Speech

What Is OCR ?

Optical Character Recognition. In other words, OCR systems transform a
two-dimensional image of text, that could contain machine printed or
handwritten text from its image representation into machine-readable text.
OCR as a process generally consists of several sub-processes to perform as
accurately as possible. The subprocesses are:

  • Preprocessing of the Image
  • Text Localization
  • Character Segmentation
  • Character Recognition
  • Post Processing

The sub-processes in the list above of course can differ, but these are roughly
steps needed to approach automatic character recognition. In OCR software, it’s
main aim to identify and capture all the unique words using different languages
from written text characters.

For almost two decades, optical character recognition systems have been
widely used to provide automated text entry into computerized systems. Yet in
all this time, conventional online OCR systems (like zonal OCR) have never
overcome their inability to read more than a handful of type fonts and page
formats. Proportionally spaced type (which includes virtually all typeset copy),
laser printer fonts, and even many non-proportional typewriter fonts, have
remained beyond the reach of these systems. And as a result, conventional OCR
has never achieved more than a marginal impact on the total number of
documents needing conversion into digital form

Python OCR Process Flow

What Is Pytesseract ?

Python-tesseract is an optical character recognition (OCR) tool
for python. That is, it will recognize and “read” the text embedded in images.
Python-tesseract is a wrapper for Google’s Tesseract-OCR Engine. It is also
useful as a stand-alone invocation script to tesseract, as it can read all image
types supported by the Pillow and Leptonica imaging libraries, including jpeg,
png, gif, bmp, tiff, and others. Additionally, if used as a script, Python-tesseract
will print the recognized text instead of writing it to a file.

What Is Pillow?

Python Pillow module is built on top of PIL (Python Image Library). It
is the essential modules for image processing in Python. But it is not supported
by Python 3. But, we can use this module with the Python 3.x versions as PIL. It
supports the variability of images such as jpeg, png, bmp, gif, ppm, and tiff.
We can do anything on the digital images using the pillow module. In the
upcoming section, we will learn various operations on the images such as
filtering images, Creating thumbnail, merging images, cropping images, blur an
image, resizing an image, creating a water mark and many other operations.
Before start working with Python, we need to install the pillow library to our
local machine. We can do it by typing the following command in the terminal.

What is Pyttsx3 ?

pyttsx3 is a text-to-speech conversion library in Python. Unlike alternative libraries, it works offline and is compatible with both Python 2 and 3 An application invokes the pyttsx3.init() factory function to get a reference to a pyttsx3. Engine instance. it is a very easy to use tool which converts the entered text into speech. The pyttsx3 module supports two voices first is female and the second is male which is provided by “sapi5” for windows. It supports three TTS engines :

  • sapi5 – SAPI5 on Windows
  • nsss – NSSpeechSynthesizer on Mac OS X
  • espeak – eSpeak on every other platform

Leave a Reply