I can follow the data modeling process for PDF data extraction, but I’m wondering if there’s a similar process for PDF “clipped regions” from V7?
I’ve found that with PDF input it’s usually not just pure data extraction, clipped regions are needed to add/remove barcodes & logos as well as reflow or retain document components. I understand that design in Connect will be very different, that’s OK, but I don’t seem to be able to find that feature… am I missing it?
Currently, it’s not possible to capture images from PDF input in the DataMapper. In fact, having binary data in the Record is not possible.
There are a few workarounds possible until the “if and when” this is available. However, they all require other tools.
One option would be to use a PlanetPress Design document, use its clipped region feature to select the image, then save that as an image in a specific location using PlanetPress Image’s features. This image could then be accessed from the Designer module using a specific path, either on the hard drive or on a URL.
Another option would be to use external programs to crop the PDF, then save it as an image. There seems to be a few such as PyPDF, GhostScript, Coherent PDF, ImageMagick, PDFill… In each case, you can still only export an image that you’ll need to save somewhere and access from the Designer module.
There’s technically a way to have an image in the record, by encoding it into its base64 string version, however it would still require the external crop & convert in addition to a script that loads the file and converts it (there’s a possible conversion script here), so it would be an extra step unless you really need to do it this way.
Of course, these workarounds are only until this feature is implemented in the DataMapper module!
If I understand correctly, then, Connect isn’t able to use PDF as input in any scenario where the PDF itself needs to be used as a resource in some fashion (background, clipped regions, etc.). At least not without a fair amount of work outside the Connect Design environment.
I, for one, would definitely vote for getting the functionality into Connect in some way, shape or form if at all possible.
It’s a pretty compelling feature that I see used quite a bit. In fact, I was just involved in a large project that was won solely on PlanetPress’s ability to enrich PDF datastreams (i.e. interchange PDF regions and data extraction regions through the workflow). Even if the current V7 Design tool had to live on under the V8 umbrella, instead of folding the feature into Connect, I’d think it would be a smart move.
For now, sticking with V7 for these type of projects works for me.
Actually, there is one thing you can do with the PDF and it’s place it as a background. When you have a PDF data mapping configuration, I believe that by default in a Print context, it will add the PDF as a background image to the section and repeat the section automatically for all the page in the record (if it doesn’t, check in the section properties, there’s an option for it). This is equivalent to the “Enhance with PlanetPress” feature that was introduced for PDF files in version 7, so you can add barcodes, OMR Marks and do imposition on PDF files in Connect - much more easily, actually!
PDF tools in the DataMapper and Designer module will gradually reach the level that you were used to in V7, such as the clipping, displaying a pdf’s specific page, etc.