Best way to reduce record range

thomasweber · September 20, 2023, 10:07am

Hello,

in many processes I have different approaches to reduce the range of records used for a Connect template. In same cases I reduce the input file, in others I store the record informations in the repository and only fetch what is needed, sometimes I use a filter to select the desired records and so on…

Dependend on the data and the requirements it makes sense to use different ways to get the best/most performant result.

But I often have the following challange:
I get a data file (csv, xml or pdf) containing an unspecified number of records. The data file should be processed in the Connect Datamapper to create a data model for a Connect Designer template (easy standard procedure).
Often customers want to have two separate outputs in one automation process. In the first step they need a pdf file based on records 1 to 5. In the second step they want to create an output file based on all records.

What is the easiest way to set the record range in an automated Connect processing?

Sharne · September 20, 2023, 10:53am

Hi Thomas,

I looked for a non coding way to do this for proofing. So all my Workflow processes generate normal output using the entire database. I then generate 50 samples to generate a proofing PDF that is sent off to the client for approval.

Here is one using CSV data. Extract all your fields and once done add a condition based on JavaScript using record.index; as the expression. Operator set to “is less than”, Operand type set to Integer. Based on Automation property, Property JobInfo9.

The steps for a simple CSV:

Condition settings:

In the false branch I use a Action step set like below:

With this method you would need to run the data mapper more than once in Workflow. Use a Set job Infos and Variables before your data mapper and set JobInfo 9 to the number of records you want to output. So in your example you would set Jobinfo9 to 6 so the condition in the mapper outputs less than 6 records.

Perhaps there is a more eloquent way. I’m stuck in Connect 2021.1 until we update soon.

Regards,
S

thomasweber · September 20, 2023, 12:44pm

Hi Sharne,

thank you for sharing your solution. In this case I have to create two separate DataMapper configurations.
Or better I could use another condition based on another jobinfo with value “print” or “proof”. So I only have to maintain one configuration instead of two.

In JavaScript words:

if(Jobinfo8 == "proof"){
	if(record.index < parseInt(Jobinfo9)){
		extract data (only record range)
	} 
	else {
		stop datamapping
	}
} 
else {
	extract data (all records)
}

I like that approach and think about recommending it.

But maybe there is an even easier way with better performance than twice executing a DataMapper configuration in the Workflow process?

My attempt was to process all records in one DataMapper step and then reducing the metadata (Datamapper output) in a branch (metadata filter). That works but I don’t like that I first have to add an additional field to the Workflow metadata (index number per document). Otherwise I can not set a condition in the metadata filter plugin.

Sharne · September 20, 2023, 12:59pm

Hi Thomas,

Only need one data mapper and one Workflow process. In the below pic I set %9 to 100000 because I know the data for that job will never reach that number, so all records are processed in the first branch. Underneath that is the Proofs branch where I set %9 to 31 so that my proofs file/pdf is only 30 records.

But maybe there is an even easier way with better performance than twice executing a DataMapper configuration in the Workflow process?

I went this route years ago and it is fine for my needs. Is there a better more efficient way in Connect 2023 is beyond my scope as mentioned I’m on 2021.1. Perhaps one of the OL staff here can advise a better solution.

Regards,
S

Phil · October 10, 2023, 1:06pm

@Sharne: your approach is the correct one.

You could improve slightly on it by also passing a starting record index (for instance, you could store a JSON string in the parameters that Workflow passes to the DataMapper: {"start":10,"end:20} ), and then your data mapping configuration could use a combination of the “Skip to next record” and “Stop data mapping” actions to control the exact range of records to process for the file.

Sharne · October 10, 2023, 1:55pm

Hi Phil,

That would be a nice way to also perform recoveries for print files. User edits a .json file, dumps the data into a hot folder, the Workflow reads the .json file via a script and that range of records gets reprocessed. Useful for PostScript output. Great idea!

I just realised I had another way to “reduce record range”, however, in around 2019 support said my idea would not work due to a SQL bug in either job preset or output preset. I don’t think it was fixed in 2021.1 but might be in 2023. I will have to check my OL support ticket emails to remember this idea. I will post here after I have revisited it.

(Really wish OL did not close down in RSA as dealing with resellers is a lot slower. Cannot wait to upgrade to 2023.)

Regards,
S