Stop all further records

Filemon · April 9, 2019, 1:36pm

Hi,

I have a data file with 100k records. I have a condition in my datamapper to “stop processing record” above record.index: 10k. I have the feeling the datammapper is also running through all the records above 10k to see if they are above “record.index: 10k”.

A data file with 10K records is done way before the data file with 100K, but they produce an equal number of records.

Is there a way to stop datamapping as soon a I hit 10K and really ignore the rest?

Phil · April 9, 2019, 1:50pm

Not currently, no. But I will add that suggestion to our TODO list, I think it has some merit.

Filemon · April 9, 2019, 1:53pm

Yeah I agree, any work arround?

Sharne · April 9, 2019, 2:02pm

Oh Wow, I did not know this since I use this method to generate proof files which are generally 50 records. So for the mapper to still process the full file is a bit silly. Glad someone noticed and brought it to OL’s attention.

AlbertsN · April 9, 2019, 2:08pm

I’d suggest splitting the data file in the workflow prior to datamapping as a potential workaround here.

Phil · April 9, 2019, 2:16pm

The feature is called “Stop processing record”, not “Stop processing data”. So it makes perfect sense, in that context, that Workflow would keep processing the rest of the file.
But as I stated, I think @Filemon’s suggestion makes sense so we could also add that option.

Filemon · April 9, 2019, 2:27pm

what plugin can split a csv data file like that?

AlbertsN · April 9, 2019, 2:49pm

You’ll want the Emulated Data Splitter. Probably in a configuration similar to this:

Here I’m splitting my CSV into 5 record chunks. You could quite simply insert 10000 there instead of 5.

Then to ensure that only the first 10k are treated, you use a simple text condition which checks which loop we’re on using the %i system variabl. If it’s anything other than 0 we know that we are not on the first chunk of data and we skip over it.

So you’d put your datamapper and the rest of the Connect plugins you’re after in the Do Something branch. Back in the main branch, if you wanted to process the entire record set, you could do so as well. Or not. Depends on what you’re wanting to do.

Filemon · April 9, 2019, 2:51pm

Nice, I’m going to try this out. I really want them to be processed 10K by 10K, So this will be helpfull

AlbertsN · April 9, 2019, 2:53pm

Ah, if you’re wanting to do the full record set in 10k chunks, then just skip the condition entirely.

Filemon · April 17, 2019, 11:59am

Thanks for your input, this works.

Filemon · April 24, 2019, 12:24pm

Hi, this works like a charm, one more question. Is there also a way to use the %i system variable and only select the last record of all records?

AlbertsN · April 24, 2019, 1:22pm

%i isn’t going to be much use, I would think. It’s just giving you the current iteration of the loop, but you haven’t got a built in way of knowing how many loops it’ll do there.

A very simple way that you could get the file from the last loop would be to output the split file to a temp folder at each iteration. Give it a static name on output so that each loop overwrites the previous file. After the loop, another branch downstream would be able to pick up that file and process it seperately.

This doesn’t seem terribly efficient to me though unless you’re still processing all of the split files as well.

Filemon · April 25, 2019, 5:08am

Thanks this will work for me

Filemon · May 22, 2019, 1:11pm

Hi, one more question…

Is the “File Store - Delete File” plugin ment for deleting temp files created in the specific run through the workflow?

If yes, how can I use it combination with the Emulated Data Splitter?
How do I get the file store ID per split?

I’m aiming for deleting the temp files (mainly the pdfs) after send to folder after each split.

AlbertsN · May 22, 2019, 1:37pm

The File Store - Delete File plugin would be used in conjunction with the File Store - Download File and File Store - Upload File plugins.

They’re meant to allow you to manually make use of the filestore that Connect uses when processing a job. Connect cleans up after itself, however, whereas the things you store manually won’t be. So unless you’re also using the Upload File plugin, there’s really no reason to worry about the Delete File plugin.

Filemon · May 22, 2019, 1:43pm

thanks this is clear now.

Filemon · May 22, 2019, 1:47pm

is there a workaround to delete temp files of the current workflow (pref when a split is done and before the new split is processed).

Phil · May 22, 2019, 2:25pm

Temp files always get deleted at the end of the process or before the next instance of the process gets executed. You shouldn’t need to clean up anything: even files that you create yourself in that temp folder (%t) will get cleaned up.

Note that the behaviour is different when running the process manually in step-by-step mode, where the Debug folder is used to create Temp files. But that folder also gets cleaned up at some point.

Phil · May 22, 2019, 3:35pm

Let’s move this conversation out of this topic since it is not related to the original post and will only be confusing for any future reader. Could you please copy/paste your last comment into your original post (Optimize Embedded Images) and we’ll pick up the conversation there.