Best way to reduce record range

Hello,

in many processes I have different approaches to reduce the range of records used for a Connect template. In same cases I reduce the input file, in others I store the record informations in the repository and only fetch what is needed, sometimes I use a filter to select the desired records and so on…

Dependend on the data and the requirements it makes sense to use different ways to get the best/most performant result.

But I often have the following challange:
I get a data file (csv, xml or pdf) containing an unspecified number of records. The data file should be processed in the Connect Datamapper to create a data model for a Connect Designer template (easy standard procedure).
Often customers want to have two separate outputs in one automation process. In the first step they need a pdf file based on records 1 to 5. In the second step they want to create an output file based on all records.

What is the easiest way to set the record range in an automated Connect processing?

Hi Thomas,

I looked for a non coding way to do this for proofing. So all my Workflow processes generate normal output using the entire database. I then generate 50 samples to generate a proofing PDF that is sent off to the client for approval.

Here is one using CSV data. Extract all your fields and once done add a condition based on JavaScript using record.index; as the expression. Operator set to “is less than”, Operand type set to Integer. Based on Automation property, Property JobInfo9.

The steps for a simple CSV:
image

Condition settings:
image

In the false branch I use a Action step set like below:
image

With this method you would need to run the data mapper more than once in Workflow. Use a Set job Infos and Variables before your data mapper and set JobInfo 9 to the number of records you want to output. So in your example you would set Jobinfo9 to 6 so the condition in the mapper outputs less than 6 records.

Perhaps there is a more eloquent way. I’m stuck in Connect 2021.1 until we update soon.

Regards,
S

1 Like

Hi Sharne,

thank you for sharing your solution. In this case I have to create two separate DataMapper configurations.
Or better I could use another condition based on another jobinfo with value “print” or “proof”. So I only have to maintain one configuration instead of two.

In JavaScript words:

if(Jobinfo8 == "proof"){
	if(record.index < parseInt(Jobinfo9)){
		extract data (only record range)
	} 
	else {
		stop datamapping
	}
} 
else {
	extract data (all records)
}

I like that approach and think about recommending it.

But maybe there is an even easier way with better performance than twice executing a DataMapper configuration in the Workflow process?

My attempt was to process all records in one DataMapper step and then reducing the metadata (Datamapper output) in a branch (metadata filter). That works but I don’t like that I first have to add an additional field to the Workflow metadata (index number per document). Otherwise I can not set a condition in the metadata filter plugin.

Hi Thomas,

Only need one data mapper and one Workflow process. In the below pic I set %9 to 100000 because I know the data for that job will never reach that number, so all records are processed in the first branch. Underneath that is the Proofs branch where I set %9 to 31 so that my proofs file/pdf is only 30 records.

But maybe there is an even easier way with better performance than twice executing a DataMapper configuration in the Workflow process?

I went this route years ago and it is fine for my needs. Is there a better more efficient way in Connect 2023 is beyond my scope as mentioned I’m on 2021.1. Perhaps one of the OL staff here can advise a better solution.

Regards,
S

@Sharne: your approach is the correct one.

You could improve slightly on it by also passing a starting record index (for instance, you could store a JSON string in the parameters that Workflow passes to the DataMapper: {"start":10,"end:20} ), and then your data mapping configuration could use a combination of the “Skip to next record” and “Stop data mapping” actions to control the exact range of records to process for the file.

Hi Phil,

That would be a nice way to also perform recoveries for print files. User edits a .json file, dumps the data into a hot folder, the Workflow reads the .json file via a script and that range of records gets reprocessed. Useful for PostScript output. Great idea!

I just realised I had another way to “reduce record range”, however, in around 2019 support said my idea would not work due to a SQL bug in either job preset or output preset. I don’t think it was fixed in 2021.1 but might be in 2023. I will have to check my OL support ticket emails to remember this idea. I will post here after I have revisited it.

(Really wish OL did not close down in RSA as dealing with resellers is a lot slower. Cannot wait to upgrade to 2023.)

Regards,
S