Tips for extracting table from PDF

Video Tutorial

The table is widely used in documents, especially in the application form, bank statement, reports, price list, just to name a few. Is it possible to convert tables from PDF into Excel, Numbers or other Office documents and then edit or calculate the data? Apparently, tables will become poor formatted if you want to simply copy and paste them to Word, Excel or other documents.

extracting-table-1

In this guide, you’ll learn the best practice of extracting table from PDF to output document more accurately with Lighten PDF Converter app. (Lighten PDF Converter app including PDF to Word Converter, PDF to Excel Converter, PDF Converter Master, PDF to Word OCR, PDF Converter OCR, both Windows and Mac versions, this guide is suitable for all apps mentioned above).

Table in PDF could be simple, or could be extremely complicated, so we have to consider the majority of situations when we design the app. Lighten PDF Converter app detects table depending on the location and formatting of the data and table borders (if any). Currently there are 2 ways to detect and convert PDF tables in our strategy.

1.Detect and extract table from PDF directly

You don’t have to do anything, just clicks ‘Convert’ button and the app will detect and convert any possible table data.
extracting-table-2

Suitable for:

The PDF file with a simple or regular table, or the table has clear borders. Lighten PDF Converter apps can accurately detect and extract most of the tables with borders.

extracting-table-3

Pros:

Easy and quick, you don’t need to learn anything, simply add a PDF file, select output file and click ‘Convert’ button.

Cons:

Any detected table will be output as table format in the output document, but sometimes the table is only needed for some specific data. And some tables which are not detected, will not be converted as the table.

How to extract table from PDF directly?
Simply add the PDF file to Lighten PDF Converter, choose an output format, click ‘Convert’ button. It will detect and extract table from PDF to output document.

2. Manually mark table area

If the auto-conversion is not correctly, you can select the table data manually. After marking the table areas, the app will try to detect tables in the selected part. If the detection is not correct, you can add rows and columns, split or merge cells before conversion.

extracting-table-4

Suitable for:

The PDF file with complex table structures, or without any table borders. Some specific PDF has very complex tables with lots of merged cells, columns and rows. Some PDF is good looking and customized in the table style. We know it is a table at the first glance, but for the application, it needs to perform complicated calculation and run thousands of lines of code in order to target the table from numerous PDF data. So if you can give the app some tips by marking the table areas, the accuracy will be greatly enhanced.

extracting-table-5

Pros:

Flexible, can deal with most of the tables in PDF, even the most complex tables can be well converted.

Cons:

It takes some extra steps before conversion, and it takes a few minutes to learn to use this function.

How to extract table from PDF manually?
First add the PDF file to Lighten PDF Converter app, in the build-in PDF reader, click ‘Show table’ button on the toolbar, it will detect the table in the current page.

If the app can not find any tables, nothing will happen when you click the ‘Show table’ button.

If the app finds a table, it will highlight the table with a blue frame. You can see the lines for columns and rows.

extracting-table-6

Move your mouse cursor to the highlighted area, you can move or adjust the table area. You can also edit the width and height of the columns and rows. If there are merged cell in the original PDF, you can select some cells and merge them into one.

Here is a detailed tutorial showing you how to extract table from PDF>>

After editing, the output will look exactly the same as the table areas you marked. It may take a few extra steps, but you don’t need to modify the table in the output file, usually, it’s more time consuming for editing a converted file.

If you don’t want to keep table format for some content in the output document, go to the built-in PDF reader, click ‘Show Table’. It will highlight the detected tables, select it and click ‘Remove table area’. And then the app will not keep the unselected part as a table in the output document.

Lighten Software’s developing team is working hard on table recognition and conversion, we’d like to offer flexible features for table conversion that none of the other PDF converters can provide in the marketplace, and convert the most complicated table data. We’d like to make it more powerful so any suggestions will be appreciated. If you have any suggestions on improving the feature, please do not hesitate to contact us via support[at]lightenpdf.com