Knowledge Base

What Is OCR (Optical Character Recognition)

Introduction of Optical Character Recognition (OCR)

OCR Stands for Optical Character Recognition. OCR application is able to recognize and extract text information out of scanned document, such as PDF, TIFF, or other document image files. A PDF Converter with OCR ability can converts scanned PDF document into editable text.

Definition from Wikipedia:

Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used as a form of data entry from some sort of original paper data source, whether documents, sales receipts, mail, or any number of printed records. It is a common method of digitizing printed texts so that they can be electronically searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.

Why Would I Need OCR Apps?

If you have a scanned document in formats such as TIFF, PDF, PNG etc, you can't alter, copy or remove any text.

not editable

Actually when you scan a paper using a scanner and save it as any supported document formats, the whole content will be captured as an image instead of text and font information. In this case, you'll need to convert the image into editable content before you can make changes to it.

That's what exactly an OCR app does, extracting text from document images or PDF files. So you have a electronic copies, and open them in applications like TextEdit, iWork Pages, Microsoft Word, or any other text processors. Then you can copy, edit, reuse or search content without hassles.

editable word document

Related tutorial:

How to Convert Scanned PDF with PDF to Word ++ >>

How can you distinguish scanned PDF from normal PDF file? >>