I have large job, over 1.3 million records, each producing three PDF outputs (a 1-up version to be used for reprinting out of Acrobat, a 2-up version for printing, and a “1 PDF per document” output to return to the client so they can pull up individual documents if necessary).
We are splitting the records into batches of 10,000.
I want to optimize the performance of Connect for these jobs of 10,000 records. I have a document discussing performance packs and dedicating “parallels” to large jobs.
Which engine? What is the difference between the MergeEngine and the WeaverEngine, what do they do? And what are the proper settings to use to maximize my output speeds on these jobs?
I just don’t know how to understand and use sentences like this, from the performance guide:
“If processing a small number of very large records (when each individual record is composed of a large number of pages), more instances with an equal amount of speed units is better. For hardware, RAM and Hard Drive speeds are most important, since the smallest divisible part (the record) cannot be split on multiple machines or even cores”
The definition of a “large record” baffles me. Meaning, a large number of fields within a record? The parenthetical comment refers to “a large number of pages”. What is a “page” in this context? Should that refer to “documents”? I just have one document per record, is that “large”? “Instances” of what? Of which engine?
A Large, Medium, or Small sized job is defined by the total number of Records in that job.
By default:
Small is less than or equal to 100
Medium greater than 100 but less than 10000
Large is over 10000
These can be user defined values and the setting for this is found in the Scheduling tab of the Connect Server Configuration. You specify Small and Large thresholds and Medium is automatically whatever falls between.