How to extract metadata from PDF-VT in Workflow

ingrid · November 23, 2021, 12:04pm

Hi,

I am supplied with invididual PDF files in PDF-VT format which contain metadata fields at document level. There are 3 fields: destinationPrinter, archiveServer, documentType.

These fields are needed to print the document to the correct printer as fast as possible and archive it to the corresponding server

I am aware I could use the datamapper to extract data from PDF-VT but I am worried this will add a few more seconds of processing time as the document needs to be printer as fast as possible.

Is there any way of extracting metadata from PDF-VT in Workflow?

If this is not possible, is there any way to extract the PDF properties values such as Title, subject, keywords?

Thanks
Ingrid

Alex_Banahene · November 24, 2021, 11:08am

Hi ingrid,

Any method of extracting that data is going to take some time, but a pdf vt extraction with thd e datamapper should be useably quick. if this is causing perfomance issues please let us know

There aren’t any plugins in workflow other than the DataMapper which can extract metadata from a PDF-vt so unless you want to add the metadata to your PDFs as white text then this would be the best method of extracting the Metadata outside of using a 3rd party application or your own custom plugin.

i hope this helps

Thanks
Alex Banahene

ingrid · November 24, 2021, 1:03pm

Hello,

The PDF background uses special graphic which would display any white text.

My team has used the Alambic API to extract information from the PDF properties instead of the metadata but it would be nice if Workflow Alambic API could have that added functionality of extracting metadata from PDF-VT in future. Could you raise a feature request?

Thank you anyway.
Ingrid.