How to extract text from PDF and images on Mac (OCR)
Text Extractor for Mac is an easy-to-use OCR application. With advanced OCR function provided by Text Extractor for Mac, you can quickly extract text content out of your scanned paper document and images file, convert them into editable and searchable text content.
For the best OCR extraction results, please read this detailed instructions.
Open Finder > Applications and click on 'Text Extractor' icon. When the interface shows up, click 'OPEN FILE' button. Then you can select a file that you wish to perform extraction from the slide-down finder window.
Supported input file formats:
PDF Document (.pdf)
Image formats: png, jpeg, jpg, tif, tiff, bmp, gif.
If the imported file is not an scanned file or image, you don't need to enable OCR option and select document language.
If you import image file or scanned PDF file, please enable OCR, select the correct document language.
Tips: How to distinguish scanned PDF file from normal file>>
You can select convert all pages or current page.
Click 'Extract' button, conversion will begin.
When the extraction process is finished, you'll see text content are available in the output area. You can edit the content directly within the output area, copy text content into the clipboard, or extract it as plain text file (.txt) .
The accuracy of OCR conversion depends on the quality of original PDF. Poor document images quality and skewed document may not be recognized accurately.
Image should be at least 300 dpi, and 600 dpi is recommended for document with smaller fonts. Or the text will stuck together and OCR is hard to accurately recognize those text.
OCR can do a better job for documents with white background and black text content.
The PDF document or image must be right-side up.
Selecting the correct document language is very important, for example, if your PDF file is in Russian, but you choose English as recognition language, the result will not correct.
If you don't want to extract text from picture areas, or any particular areas of the imported file, please drag to point out those areas.
OCR is not an easy task, some similar characters, for example, 'e' and 'c', 'l', 'i' and '!' may not be recognized correctly. Running spelling check can quickly locate those mistakes and correct them quickly.
Right click on the output area, choose 'Spelling and Grammar' -> 'Show Spelling and Grammar'.
A window will pop up. For example, 'skill' is mistakenly recognized as 'Skil!', run the spelling check to quickly discover this mistake, select the right one from the recommendation and click 'Change' button.
You may also interested in following product:
PDF to Word OCR for Mac - Convert eletronic and scanned PDF files into editable and well-formatted Microsoft Word document.