Your first step should be to break the PDF into logical records. If an invoice is multiple pages, your record should be composed of all of those pages.
So you’d need to create a boundary script first and foremost that can break when it detects the first page of either doctype. You can find some examples here: PlanetPress Connect 2019.1 User Guide
From there, the sorting will be easy. You won’t be doing the sort with workflow metadata, but rather with a Job Preset that’s either sorting or grouping based on your criteria. Once the Job Preset does it’s thing, you’ll output to PDF from your Output Preset and you’re done.
Edit: I’ve gone ahead and built up a basic example of what this might look like. You’ll find the resources here: https://objlune-my.sharepoint.com/:f:/g/personal/albertsn_ca_objectiflune_com/EjGvHUB7KzpIk0fF4H_y_qgB21DOx2DxqsXmBH0RrTO41w?e=YeZ2Td
I assumed that every page of each doc type contains the invoice number in the exact same place. That is to say, DocType1 has it at X on every page and DocType2 has it at Y on every page.
var zeDocType1 = boundaries.get(region.createRegion(180,1,200,30));
var zeDocType2 = boundaries.get(region.createRegion(1,1,40,30));
if(zeDocType1[0]!=boundaries.getVariable("lastDoc1") || zeDocType2[0]!=boundaries.getVariable("lastDoc2")){
boundaries.set();
}
boundaries.setVariable("lastDoc1",zeDocType1[0]);
boundaries.setVariable("lastDoc2",zeDocType2[0]);
This is a modification of the example provided in my previous link about scripted boundaries. DocType1 in my example has a customer number in the upper right, whereas DocType2 has it in the upper left. So the first thing we do is to store these regions in variables.
Next is a simple condition to compare our current values to any previously stored value. If they are not equal, we set our boundary.
Finally, we actually store our current value as the ‘previous’ value. This way when the script loops at the next record, it will know what the previous record contained.
This gets us our boundaries nicely defined and my 314 page sample data gets broken up into 78 records as expected. Each of those records contains a variable number of pages (in reality it’s either 3 or 5 since I didn’t take a lot of time making truly random sample data).
The template then uses the orginal PDF pages as the background and I overlay a bit of data on that. Nothing fancy here. I’m just taking data off of the page and reprinting it. But this could represent a barcode or a fully re-designed and ‘normalized’ template.
I’ve also included the Job and Output Presets. The job is doing nothing more than your two level sort. First on my customer number, then on my doc type. A note on the doc type. In my case, it was written clearly on the page and I simply extracted the text into a datamapping field. In reality this is not likely to be so simple. Instead, some sort of condition would be required in the datamapper to determine which type of document we’re currently looking at, and then store your value in a datamapping field.