2 Data files merged into a single output

Hi Guys,

I’m dealing with a situation where I need to handle two data PDF files:

  1. Perform individual processing on each of these two data files.
  2. Eventually, combine them into a single output.

The challenge I’m encountering lies in the fact that the workflow processes aren’t inherently aware that there are two data files to process. It consistently handles one file at a time, potentially leading to a missed/run condition multiple times. I’ve considered commingling as a potential solution, but I’m open to other suggestions.

Do any of you have ideas on how I might approach this?

Thank you in advance.

James.

Have you considered joining them and then processing that single file using 2 different data mapping configs? Since the DM has the ability to skip records and to stop processing a file when certain conditions are met, this could be done fairly easily.

So your PDF would be PDF1+PDF2

Then your DataMapper1 would receive that combined file and it would first evaluate a condition that’s specific to PDF1 … as soon as it becomes false, then the DataMapper1 stops processing the rest of the file.

Then DataMapper2 would receive the file, skip all the first records until it finds one that meets a condition specific to PDF2, and it would then process the file from that point to the end.

Not sure if that applies to your case, but I thought I’d mention the option.

Hi @Phil,

That sounds like a tempting option :slight_smile: But might make things a bit complicated.

Let me try and illustrate the issue with a diagram. I need to perform some operations on PDF1 and PDF2 with separate Datamapper1+template1 and Datamapper2+template2. Afterward, these processed files need to be merged into a single PDF3 file, followed by additional operations on this combined file.

Consider the workflow: PDF1 undergoes operations in P1, resulting in the file being stored on disk/database. PDF2 is processed, and its output is stored in the same location as the output of the P1 process. PDF1 and PDF2 can be dropped simultaneously or one at a time. The challenge lies in how process P3 can determine when processes P1 or P2, or both, have concluded. One possible solution is to create a trigger file(s) for the P3 process to monitor. The illustration below, while not an accurate representation of the workflow, may aid in explaining the encountered issue.

Thank you for your help in advance

Hi James,

Is there a way to determine that the files belong together? I assume some kind of naming convention, is that correct?

Erik

Hi @Erik

That is correct, we can assume PDF1.pdf and PDF2.pdf are the filenames.

In Workflow, you could use the File count input task. This input will look at a specific folder for a specific file mask and will trigger its process only if the total number of files that match the criteria is equal to the target number specified in the task:

image
In this example, the task is monitoring the C:\Output folder for files whose names match MyPDFxxx.pdf.

Note, however, that this will only work if you are processing files one pair at a time (i.e. the two files can be dropped separately, but they must have completed their processing before you drop the first of a new pair of files).