Accessing and using metadata in a workflow script

Hey everybody,

I’m working on a workflow that picks up a couple of pdf’s, merges them and then inserts a page in front of each document. Those pages show information about the number of pages in that document. I’ve managed to get a run script plugin(programmed with javascript) together that inserts the infopages into the main pdf, but I currently have no way to figure out where the document starts/ends so I can’t automate the insertion process. I’m assuming I can get the information I need out of the ‘Metadata Manipulation API Reference’ but I’m having a hard time figuring out how to access this information in my script

There are a number of ways of achieving this, it all really depends on your workflow process. You say you’ve managed to insert the infopages in the “main” PDF (I assume that means the “merged” PDF), but you don’t mention how you generate those infopages and whether they are available before, during, or after the merging operation. Also, you don’t mention how the merge is actually performed.

I think it would be easier to open a call with our Support team, they will be able to give you a procedure that matches your current process.

I’m not convinced that a conversation over the phone is going to be very helpful when it comes to solving my problem, but I’ll keep it in mind. It’s a lot easier to show things via writing, so I’m going to try this again first, but with a more in dept description of what I have and what I want to achieve.

Below is pretty much what my workflow looks like at the moment.

Merge pdfs : merges the input pdf files, Optimizes the resulting pdf and creates metadata

|

Run Script:

// Open Current Job file as the PDF to modify

var InputPDF = Watch.GetPDFEditObject();

InputPDF.Open(Watch.GetJobFileName(),false);

var metadata = Watch.GetMetadataFileName();

// Open a PDF file containing the page to be added to the inputPDF

var InsertPDF = Watch.GetPDFEditObject();

InsertPDF.Open("C:\PlanetPress\PdfInput\Insert.pdf",false);

// Currently inserts a page after every 10 pages. Needs to be changed to insert a page before each document

for(var pageIndex = 0; InputPDF.Pages().Count() > pageIndex; pageIndex += 10)

{

InputPDF.Pages().InsertFrom2(InsertPDF.Pages(),0,1,pageIndex);

}

InputPDF.Save(false);

InputPDF.Close();

InsertPDF.Close();

|

Send to folder

I mostly wanted to find out how to access the MetaData via the script, to find out where the documents start so I can insert my a pdf there in front of them.

The pdf that is being inserted is currently a pre made pfd file, purely made for testing. I’m planning to generate it with a subprocess later, since I want some information in these pdfs that can only be found during the workflow. They should display the amount of pages in the document after them.

OK, it’s a simple process. Here’s a script that should work for you:

// Open Current Job file as the PDF to modify
var InputPDF = Watch.GetPDFEditObject();
InputPDF.Open(Watch.GetJobFileName(),false);

var metadata = new ActiveXObject("MetadataLib.MetaFile");
metadata.LoadFromFile(Watch.GetMetadataFileName());

// Open a PDF file containing the page to be added to the inputPDF
var InsertPDF = Watch.GetPDFEditObject();
InsertPDF.Open("C:\\PlanetPress\\PdfInput\\Insert.pdf",false);

var pdfCount = metadata.Job().DocumentCount();
var lastPageInDoc, absolutePageIndex;

for(var docIndex = pdfCount-1; docIndex>=0 ; docIndex--)
{
  lastPageInDoc = metadata.Job().Group(0).Document(docIndex).PageCount();
  absolutePageIndex = metadata.Job().Group(0).Document(docIndex).DataPage(lastPageInDoc-1).IndexInJob();
  InputPDF.Pages().InsertFrom2(InsertPDF.Pages(),0,1,absolutePageIndex+1);
}
InputPDF.Save(false);
InputPDF.Close();
InsertPDF.Close();

First, note how the metadata object is created as an ActiveX Object that then loads the Watch.GetMetadataFileName() value.

Then, the number of Metadata Document nodes (i.e. PDFs that were merged) is stored in variable pdfCount. The script then loops backward from the last document to the first. It does so because if it were to add pages from the first to the last document, then the page Indexes and Counts found in the metadata would no longer match the content of the PDF as the script moves from one document to the next.

In the loop, the script retrieves the number of pages in each Document and uses that value to find the absolute Index (relative to the entire job) of the last page in the document. It then inserts your external page immediately after that last page.

Hope that helps.

2 Likes

Thanks, this helped me out big time.