I have a PDF containing tens of thousands of unsorted records. There are three types of accounts: Premium, Silver, Basic. Records have variable number of pages. I have been asked to build a solution to update each record with the correct T&C for the account type.
The T&C pages should go after the “Marker” page within each record. Each marker page is identified by a text box on the top right corner of the page.The text box contains other text including the words PREMIUM, SILVER or BASIC (all caps in bold) for the account type. The T&C are in 3 separate PDFs and span between 3 to 6 pages.
I have tried creating 4 sections in the Designer: 1 for each record and the 3 others for the T&C types. I have then used a control script to enable the correct T&C section for each record. This works; however the T&C pages are only appended at the end of Section 1 and not at the correct page index or after the “Marker” page within each record. How do I get the current page index in each record? In addition, this approach tends to be quite slow and the uge PDF is taking ages to generate output. I often get errors relating to insuficient RAM in the workflow even though the system has 32GB RAM which is not clearly all used.
What’s the best way to insert the correct T&C pages after the “Marker” page in each record within the PDF?
Here some code generated by one of our Guru to answer to something not exactly as your request but it should give you a direction on to doing what you want. You will then need to look at the Alambic API. That would be the best way to achieve your merging as big jobs as yours would take very long using other Workflow plugin.
The following code is used to concatenates a big amount of PDF together. Again, not exactly what you wanted but it is a start.
Start your process with a Folder Listing input task that looks for all the PDF files inside the folder where they are stored.
Then add a Run Script task with the following (JavaScript) code:
var tmpFileName=“”;
var myPDF = Watch.GetPDFEditObject();
myPDF.Create(“C:\Tests\Merged.pdf”);
var fileCount = Watch.ExpandString(“xmlget(‘/files[1]/@count’,Value,KeepCase,NoTrim)”);
for (var i=1;i<=fileCount;i++) {
tmpFileName = Watch.ExpandString(“xmlget('/files[1]/folder[1]/file[”+i+“]/path[1]‘,Value,KeepCase,NoTrim)xmlget(’/files[1]/folder[1]/file[”+i+“]/filename[1]',Value,KeepCase,NoTrim)”);
Watch.Log(tmpFileName,2);
myPDF.Pages().InsertFrom(tmpFileName,0,-1,myPDF.Pages().Count());
}
Thank you for this but I have tried your script but it doesn’t do anything at all and I don’t see how I can edit it to make it work. This is not what I am trying to achieve. I am not trying to merge many PDFs as I don’t have many PDFs: onle a single huge PDF with many records.
So, just to explain again. I have 1 single PDF (tens of thousands of pages with many unsorted record types: PREMIUM, SILVER and BASIC accounts). All I want to do is insert the T&C pages inside each record after the “Marker” page.
So for example. Let’s assume I have one PDF of 24 pages with 4 records:
Record 1 is a PREMIUM account and it has 9 pages. The Marker page here is the 5th page. I want to insert the 6 Premium T&C pages after page 5 of this record.
Record 2 is a BASIC account and it has 5 pages. The Marker page here is the 3rd page. I want to insert the 3 Basic T&C pages after page 3 of this record.
Record 3 is SILVER account and has 6 pages. The Marker page is the 5th Page. I want to insert the 4 Silver T&C pages after page 5 of this record.
Record 4 is another SILVER account with 4 pages. The Marker page for this record is the last page. I want to insert the 4 Silver T&C pages after page 4 (or after the last page) of this record.
I need to find a way to read all the pages in the PDF, then identify the Marker page in each record and simply insert the corresponding T&C pages for that record.
Use the Run Script action and set its language to VBScript. Then copy and paste the below script. You will need to edit the parameters of the ExtractText2() function to the corresponding region of the text which identifies the marker page and account type on each record. Once the corresponding TC pages are inserted, the loop index is advanced by the number of pages in the TC.
dim objPDF, accountType, i
Set objPDF = Watch.GetPDFEditObject
objPDF.Open Watch.GetJobFileName, false
i=0
Do while i =< objPDF.Pages.Count-1
accountType = trim(objPDF.Pages(i).ExtractText2(2.02083,0.96875,5.89583,1.18753))
if (accountType ="BASIC") then
objPDF.Pages.InsertFrom "E:\Forums\156624\TC\BasicTC.pdf", 0, -1, i+1
i= i+3
elseif (accountType ="SILVER") then
objPDF.Pages.InsertFrom "E:\Forums\156624\TC\SilverTC.pdf", 0, -1, i+1
i= i+4
elseif (accountType ="PREMIUM") then
objPDF.Pages.InsertFrom "E:\Forums\156624\TC\PremiumTC.pdf", 0, -1, i+1
i= i+6
end if
i=i+1