Metadata sorter not performing "then by"?

garycarroll · November 17, 2019, 5:57pm

I have a single PDF containing many invoices, which come from two different sources, and have different formats. These are Doctype 1 or Doctype 2, identifiable by the format of the invoice. There may be any number of Doctype 1 and Doctype 2 invoices for any given AccountNumber, and they may be in the input PDF in any order.

Each invoice has a field containing AccountNumber.

The goal is to sort the PDF so that all invoices to the same AccountNumber are together; then to sort within that so that Doctype 1 invoice(s) come before Doctype 2 invoice(s). It’s possible that there may be Doctype 1s with no matching Doctype 2, and these are printed. However, Doctype 2s with no matching Doctype 1 are not printed.

The problem:

I use a datamapper to break on the first page of each invoice (which may be more than one page); extract the fields “Doctype” and “AccountNumber”.

I then use metadata sorter to sort first on AccountNumber and then on Doctype.

The metadata seems to be sorted properly on AccountNumber, but there is no sort on Doctype. No error is noted.

I’m not clear on how I use the metadata to produce a sorted PDF, but I suspect that is going to become clear with a little more experimentation. But, I’m stuck on why the secondary sort is not happening. Any common errors that come to mind?

AlbertsN · November 18, 2019, 2:37pm

Your first step should be to break the PDF into logical records. If an invoice is multiple pages, your record should be composed of all of those pages.

So you’d need to create a boundary script first and foremost that can break when it detects the first page of either doctype. You can find some examples here: PlanetPress Connect 2019.1 User Guide

From there, the sorting will be easy. You won’t be doing the sort with workflow metadata, but rather with a Job Preset that’s either sorting or grouping based on your criteria. Once the Job Preset does it’s thing, you’ll output to PDF from your Output Preset and you’re done.

Edit: I’ve gone ahead and built up a basic example of what this might look like. You’ll find the resources here: https://objlune-my.sharepoint.com/:f:/g/personal/albertsn_ca_objectiflune_com/EjGvHUB7KzpIk0fF4H_y_qgB21DOx2DxqsXmBH0RrTO41w?e=YeZ2Td

I assumed that every page of each doc type contains the invoice number in the exact same place. That is to say, DocType1 has it at X on every page and DocType2 has it at Y on every page.

var zeDocType1 = boundaries.get(region.createRegion(180,1,200,30));
var zeDocType2 = boundaries.get(region.createRegion(1,1,40,30));

if(zeDocType1[0]!=boundaries.getVariable("lastDoc1") || zeDocType2[0]!=boundaries.getVariable("lastDoc2")){
	boundaries.set();
}

boundaries.setVariable("lastDoc1",zeDocType1[0]);
boundaries.setVariable("lastDoc2",zeDocType2[0]);

This is a modification of the example provided in my previous link about scripted boundaries. DocType1 in my example has a customer number in the upper right, whereas DocType2 has it in the upper left. So the first thing we do is to store these regions in variables.

Next is a simple condition to compare our current values to any previously stored value. If they are not equal, we set our boundary.

Finally, we actually store our current value as the ‘previous’ value. This way when the script loops at the next record, it will know what the previous record contained.

This gets us our boundaries nicely defined and my 314 page sample data gets broken up into 78 records as expected. Each of those records contains a variable number of pages (in reality it’s either 3 or 5 since I didn’t take a lot of time making truly random sample data).

The template then uses the orginal PDF pages as the background and I overlay a bit of data on that. Nothing fancy here. I’m just taking data off of the page and reprinting it. But this could represent a barcode or a fully re-designed and ‘normalized’ template.

I’ve also included the Job and Output Presets. The job is doing nothing more than your two level sort. First on my customer number, then on my doc type. A note on the doc type. In my case, it was written clearly on the page and I simply extracted the text into a datamapping field. In reality this is not likely to be so simple. Instead, some sort of condition would be required in the datamapper to determine which type of document we’re currently looking at, and then store your value in a datamapping field.