Variable page range in a PDF

I have a pdf that is 1000 pages. Those 1000 pages contain a variable number or pages per record (record 1 is 2 pages, record 2 is 4 pages, record 3 is 4 pages, record 4 is 2 pages… ect.). How do I set that up in my data mapper to recognize a group of pages as one record? I know I can use the account number which is located in the same place on the document, but I am not sure if I need to set my boundaries On Script and write code?

Also, I am adding a seq number using the job creation settings. I only want that seq number to show up on the front page of the document, but it is showing on both sides. How do I fix that?

Hello,

For your first question the answer is as follow:

In your DataMapper, first select the account number on the screen. Next under the Boundaries tab, Select in the trigger drop-down list “On text”. Further down, in the Operator drop-down list select “On changes”.

That will put all pages that have the same accoutn number as part of the same record.

For you second question, please explain me how you added a sequential number using the “job creation settings”?

Thank you for the response but I don’t think that will work…

Page 1 - Invoice & Page 2 - is the backer of that invoice so the answer above will not work because the second page doesn’t show the account number. I would also use the On Page trigger and split on 2 but sometimes a record could be 4 pages or 6 pages, so that will not work.

I am adding a metadata document tag using a seq number that I created in the data mapper. Then in the Output Creator I am adding additional content, that being the metadata.

Hi,

If you want the data mapper to treat your PDF as data and split the PDF into their records of n pages then what hamelj suggested will work. You need to try what someone suggests before ‘thinking’ it wont work. :slight_smile: If you had the account number on every page on the same spot, then his suggestion would not delimit the pages into records of variable number pages and you would have to find something else that is unique to page one of each record.

Regards,

S

Actually, what was suggested was the first thing that I tried before posting to this forum. And, unfortunately, it doesn’t work. The account number is not on every page because page two is just a backer for every invoice page, that is why I am wondering if there is a script that I could write?

If account and change and doesn’t equal nothing?

I have this working in Planetpress 7 through talk, but would like to set it up using Connect because of the impose features in the Output Creation.

It might sounds as a repeated question but is there anything that exist only on the first page of an invoice (and that doesn’t on a second invoice page “not the backer”)? Here I am not talking about the account number, we already discarded that…It could be anything…even a single character, as long as it is at a specific position and that it is specific to the first invoice page?

Could also be the absence of something on th first page of an invoice.

To add my little 2 cents: in addition to hamelj’s suggestions, you could also look for content on the backer page (i.e. page 2) which, from what you’re saying appears to be static (I’m assuming it’s Terms and Conditions, or something similar). So once you find content that is unique to that page, you can set your boundary to occur 1 page before (-1).

I appreciate all of the help and responses but maybe I am not being clear on what I need.

I have a pdf file that is, say, 300 pages.

Page 1- Account 1234 Invoice Page

Page 2- Account 1234 Invoice Backer (generic)

Page 3- Account 5678 Invoice Page

Page 4- Account 5678 Invoice Backer (generic)

Page 5- Account 5678 Invoice Page

Page 6- Account 5678 Invoice Backer (generic)

As you can see, these two account have a different amount of pages per account. So in this example I would have 2 records.

I have not been able to find a way to make this happen using the options, that is why I was wondering if there was a script that could be written?

I’m almost certain you don’t need a script. Please post an sample PDF with 2 or three invoices in there and I will post back the data mapping configuration. Make sure to blot out any personal information, but keep most of the PDF as close to the original as possible.

Not sure how to add a pdf to this chat?

See Three Differents problems - XML nested tables In Datamapper - Design and tray selection in Connect - DataMapper - Upland OL User community

https://learn.objectiflune.com/qa-blobs/7284557029285909334.pdf

Ahhhhhhhhh… you’ve got Terms and Conditions after EVERY page, not just after the first one. Sorry, I didn’t realise that until I saw the PDF.

So yes, you’ll need to use a boundary script. Something like the following should do the trick:

var zeAccount = boundaries.get(region.createRegion(75,50,100,60));
if (zeAccount[0].match(/\d\d\d-\d\d\d\d\.\d\d\d/)) {
 if (boundaries.getVariable("lastAccount")!=null) {
  if ( zeAccount[0]!=boundaries.getVariable("lastAccount"))
     {
   boundaries.set();
  }
 }
 boundaries.setVariable("lastAccount",zeAccount[0]);

}

The code looks for the account number by matching the region with a regular expression (999-9999.999). If it finds a match (which doesn’t occur on the T&C pages), it records it and as it moves down all the pages, it compares any new value it finds with the recorded value. If that recorded value is different, it means we’re on a new document boundary. Note that the documentation for the region.createRegion() method is incorrect: the parameters are actually left, right, top, bottom.

This works beautifully! Thank you VERY much!

Per Phil, below…

So yes, you’ll need to use a boundary script. Something like the following should do the trick:

var zeAccount = boundaries.get(region.createRegion(75,50,100,60));
if (zeAccount[0].match(/\d\d\d-\d\d\d\d\.\d\d\d/)) {
 if (boundaries.getVariable("lastAccount")!=null) {
  if ( zeAccount[0]!=boundaries.getVariable("lastAccount"))
     {
   boundaries.set();
  }
 }
 boundaries.setVariable("lastAccount",zeAccount[0]);

}