I think @jbeal84’s request has to do with merging existing PDFs, rather than splitting a large PDF into grouped PDFs. If that’s the case, then it can be achieved with a Workflow process, without involving the DataMapper or a Connect template.
The idea is to get a listing of all PDFs that need to be merged based on the postal address, extract that address and concatenate the files that have the same address into single output PDFs. Here’s a sample process that would do that:
The key part of the process is the short script highlighted in yellow See code below).
That script reads the postal address from the input PDF and checks if that address has been recorded before. If it has, then script automatically assigns the same output file name in order for the Send To Folder task to concatenate the current PDF with all the other ones that had the same postal address. If the address is encountered for the first time, then the script generates a new, unique output filename (using %u) and records the new name, along with the associated postal address, in the allFiles variable (which is just a JSON array).
Here’s the script:
// Load list of existing addresses and corresponding file names
var allFiles = JSON.parse(Watch.GetVariable("allFiles"));
var inputFile = Watch.GetVariable("inputFile");
// Read postal address from current PDF
var inputPDF = Watch.GetPDFEditObject();
inputPDF.Open(inputFile,false)
var oneAddress = inputPDF.Pages(0).ExtractText2(0.40625,1.67708,4.83333,2.52083).replace(/\n/g,"_");
CollectGarbage();
inputPDF.Close()
// Check if address already exists
var existingAddress = allFiles.filter(function(elem){
return elem.address==oneAddress
});
// Set file name and if it's the first time this address is encountered,
// record it and its corresponding file name in the allFiles variable
if(existingAddress.length==1){
Watch.SetVariable("fileName",existingAddress[0].fileName);
} else {
var newAddress = {address:oneAddress,fileName:Watch.ExpandString("%u")};
allFiles.push(newAddress);
Watch.SetVariable("fileName",newAddress.fileName);
Watch.SetVariable("allFiles",JSON.stringify(allFiles));
}
Obviously, this code generates random (albeit unique) output file names, so you may want to adjust that using some of your own logic. In addition, the postal address is read from a specific location on Page 1 of each PDF, so you will also have to adjust that to match your own PDF files.
Note that the allFiles variable has a default valueof [] (i.e. empty array). I could have implemented some code in the script to check whether or not the variable has already been initialized, but I went for the easy method instead.
Also note that the 2 Change Emulation tasks are critical: the main branch loops on XML data while the script needs to read from PDF data, so you have to make sure the process’ emulation is aware of those changes in the data format along the way.