How to extract text from PDF and images on Mac (OCR)

Post Index

1. Launch the app and open a file
2. Select output and OCR options
3. Start extraction
4. Use the extracted text content
Advanced steps for increasing the conversion quality and your productivity:
1. Improve conversion quality
2. Run spelling check to quick fix incorrect recognition

Text Extractor for Mac is an easy-to-use OCR application. With advanced OCR function provided by Text Extractor for Mac, you can quickly extract text content out of your scanned paper document and images file, convert them into editable and searchable text content.

For the best OCR extraction results, please read this detailed instruction.

1. Launch the app and open a file

Open Finder > Applications and click on ‘Text Extractor’ icon. When the interface shows up, click ‘OPEN FILE’ button. Then you can select a file that you wish to perform extraction from the slide-down finder window.

text-extractor-interface

Supported input file formats:

PDF Document (.pdf)

Image formats: png, jpeg, jpg, tif, tiff, bmp, gif.

2. Select output and OCR options

If the imported file is not a scanned file or image, you don’t need to enable the OCR option and select document language.

If you import image file or scanned PDF file, please enable OCR, select the correct document language.

Tips: How to distinguish scanned PDF file from normal file>>

You can select convert all pages or current page.

text extractor select language

3. Start extraction

Click the ‘Extract’ button, the conversion will begin.

4. Use the extracted text content

When the extraction process is finished, you’ll see text content are available in the output area. You can edit the content directly within the output area, copy text content into the clipboard, or extract it as plain text file (.txt).

text extractor copy content

Advanced steps for increasing the conversion quality and your productivity:

1. Improve conversion quality

(1) Increase image resolution

The accuracy of OCR conversion depends on the quality of the original PDF. Poor document images quality and a skewed document may not be recognized accurately.

The image should be at least 300 dpi, and 600 dpi is recommended for document with smaller fonts. Or the text will be stuck together and OCR is hard to accurately recognize those text.

OCR can do a better job for documents with white background and black text content.

(2) Rotate page to the correct orientation

The PDF document or image must be right-side up.

(3) Select correct document language

Selecting the correct document language is very important, for example, if your PDF file is in Russian, but you choose English as a recognition language, the result will not correct.

(4) Point out the picture area

If you don’t want to extract text from picture areas or any particular areas of the imported file, please drag to point out those areas.

2. Run spelling check to quick fix incorrect recognition

OCR is not an easy task, some similar characters, for example, ‘e’ and ‘c’, ‘l’, ‘i’ and ‘!’ may not be recognized correctly. Running a spelling check can quickly locate those mistakes and correct them quickly.

Right-click on the output area, choose ‘Spelling and Grammar’ -> ‘Show Spelling and Grammar’.

text extractor check spelling

A window will pop up. For example, ‘skill’ is mistakenly recognized as ‘Skil!’, run the spelling checker to quickly discover this mistake, select the right one from the recommendation and click the ‘Change’ button.

text extractor check spelling 2

You may also interested in the following product:

PDF to Word OCR for Mac – Convert electronic and scanned PDF files into editable and well-formatted Microsoft Word document.