Trim PDF Pages Before Extracting Data

Hi,

I am very new to OL Connect. Is there a way to have Designer trim the last 2 pages from a pdf before setting my record boundaries?

Many of the pdf data files I am working with have this same structure: Summary (2 pages) > Content > Last Page Indicator (2 pages because duplex).

I am splitting the records on text “PAGE 1 OF” which only appears in the Content section.

Connect is already conveniently treating the Summary section as record 1 (I am guessing because it has not encountered the boundary definition yet). I am able to filter this out using the job preset settings by extracting an area which contains the words “SUMMARY” on the first page of only that record.

It is the Last Page Indicator that is creating a problem. It does not contain the boundary information “PAGE 1 OF”, so Connect is just treating it as the last two pages of the last record.

I would do it in Workflow using the Alambic API as so:

// File System Object
var fs = new ActiveXObject('Scripting.FileSystemObject');

// PDFs declaration
var originalPDF = Watch.ExpandString("%o");

// Load the original PDF
var pdf = Watch.GetPDFEditObject();
pdf.Open(originalPDF, false);

//Check number of pages and delete last two ones if greater than 2 pages
var nbPages = pdf.Pages().Count();
Watch.Log("nbPages at first= "+nbPages,2);

if(nbPages > 2){
  for(var i = nbPages-1; i >= 0; i--){
    if(i >= nbPages - 2) pdf.Pages().Delete(i);
  }
}

nbPages =  pdf.Pages().Count();

Watch.Log("nbPages at end= "+nbPages,2);
// Saves the PDF
pdf.Save(false);
CollectGarbage();
pdf.Close();


Thank you! I am not too familiar with workflow yet, but I will give this a shot after I learn the basics and update you then.