Lookup additional data from PDF files in datamapper

I have a zip file containing a access file and a bunch of PDF files which can range from invoice, statements, remittance letters, credit letters…etc.

Each record in that access file corresponds to pdf file type in a random order
So for instance,:
Access file 1 could have 3 records for a statement, credit and upgrade offer letters (in pdf)
Access file 2 could have 2 records for a an invoice and reminder letters (in pdf)
etc…

Access file structure:
CustomerNumber, DocumentType, DocumentNumber, etc…
012345, Invoice, INV12345,etc…
456789,Credit, CR987654, etc…
etc…

The PDF files are named as follow:
CustomerNumber_DocumentType_DocumentNumber.pdf
Ex: 012345_Invoice_INV12345.pdf

The datamapper is based on the access file.
Sometimes, a PDF might be named with the wrong customer number or invoice number and we need to open the PDF to check that these values match the ones in the access file. The customer and invoice number will be written on the first page of the PDF at specific locations.

In there any way to open the PDF in the Action step for each record in the access datamapper to read the customer and document numbers inside the PDF?

So my datamapper is based on the Access file and I need to lookup information from a PDF file to complement or validate each record. Is this possible? If, so which object of the datamapper API do I use?

I believe I’d do this validation within Workflow. Using a Job Preset, promote the various fields needed to name a PDF to “Connect Metadata”, and in your Connect Plugins be sure to check whatever boxes pull Connect Metadata into Workflow Metadata (“Output Records in Metadata”).

Then you can use Workflow metadata splitters, job infos, etc. to get these metadata fields, build a PDF filename, open the PDF, read the region in question, Text Condition to see if the values match, etc.

It’s extremely slow when I have about 5000+ PDFs…just to find that the 4998th record is invalid and the whole operation needs to be aborted.

I need a way to perform the validation in the datamapper with the Validate Only option in the Execute Datamapping plugin in Workflow.

Use the “Validate Only” option in the Execute Data mapping task. That will let you know very quickly of your data mapping config executes properly with any particular data file.

Check out this online help page for more information on that option’s output.