Extract data from multiple PDFs that exist in given folder

djinkal · February 26, 2025, 1:41pm

I have alomost 10000 pdf,each has just 2 page, want to extract data from PDF using datamapper and workflow, where workflow takes file one by one and give me output csv with metadata of all files together.

TGREER · February 26, 2025, 3:19pm

You may not need any Connect Resources. If the data extraction is straightforward, you can simply do a data selection in Workflow, followed by a Create File to place the extracted data, and a Send to Folder with “concatenate” checked.

djinkal · February 26, 2025, 4:19pm

okay that answers how I can get output in single file. But another question is how can I create page based data mapper for PDF that has data on front page only, so my data mapper should fetch data from front page only. Also it should take one by one pdf as input from given folder.

TGREER · February 26, 2025, 4:39pm

The Folder Capture task will perform a loop, picking up each PDF and running it through all the tasks below it.

When you build your Workflow Process, on the Debug menu, you can “Select” a sample PDF file.

After your Folder Capture, place a Create File. Double-click Create File to open it, then right-click inside the large white area and select “Get Data Location”. Select the data you wish to extract, from the page you wish it extracted. That will place code inside Create File that will be evaluated at run time to get your data.

It should look like this (random example):

region(1,0.86458,2.26041,5.375,2.71875,KeepCase,Trim)

Make as many data selections as required, mixing it with static text as necessary to build your CSV record.

Next plugin is Send to Folder, with a filename spec’d and “concatenate” checked.

Three plugins should do all you need:

Folder Capture
Create File
Send to Folder

No data mapping required for this!