Do delimited data and PDF API related stuff on Datamapper

james123456 · March 29, 2022, 8:23am

Hi Guys,

Hope you are all okay and well.

I need to find a quick solutuion to some problem I have. What I have is a few PDF full paths and I need to merge them into a single output. But here comes the slight problem, As these files are in a cvs like structure I want to read the PDF using the file path using the AlambicEdit Library but I believe it’s only available in the workkflow. Then I can extract some piece of information int he PDF file for some other checks. Is there something similar I can use in the Datamapper? Can I mix emulations in the Datamapper instead of having to use the workflow? Or I’m I complicating things? Hope this make sense.

Thank you in advance.

thomasweber · March 29, 2022, 9:24am

Hi james,

did I understand correctly and you have a CSV file in which the complete path information to several PDF files is available?

In this case, I would use the CSV file as input data in Datamapper and then include the PDF file dynamically per data set as background PDF in Designer according to the data model. As far as I know there is no way to read the PDF files in Designer (except for size and page information).
Alternatively, you could merge the individual PDF files into one large PDF file in advance and then use that as the input file in Datamapper.

What informations specifically do you need to read from the PDF?

jouberto · March 29, 2022, 2:45pm

Hi James,

Since you bring up the PDF library, I believe in this case it will be offering the best performance. You loop through each PDF and extract the info from each, either adding to your CSV with the data pulled or creating a new file in the format of your choice, then use that as your input. It would be simple to merge the PDFs beforehand and use mapper on that, but you would introduce overhead. If you have a large amount of PDFs, that extra time could be non-negligible. It will depend on your overall needs and scenario. Both approaches have pros and cons. Ultimately, the answer to the question is no, you can’t mix “emulations” in datamapper. There are cases where you can complement your data using an action task but i nthe case of PDF text extaction, I don’t know of a way to achieve that on the fly.

james123456 · March 30, 2022, 8:21am

Hi @thomasweber,

That is correct, you got the question right. Just text extractions on a specific page then use this information perform some logic withing the datamapper. I already do this in some of my projects though there is nothing I can do with this information.

@jouberto I was afraid you would say that. Is this something that can be added to the datamapper in future? We can access to the openTextReader()/TextReader/openTextWriter APIs when you use csv/delimited/text data file, just wondered if there was a way to also add some openPDFReader() or something similar as a feature request?

Thanks all, appreciate you help. Will try to find someother way to acheive this in the meantime.

Phil · March 30, 2022, 9:41am

@james123456: we are already planning on adding something similar to your request, but I can’t give you a timeline for the feature yet.