Sorry, was just about to jump in a meeting so I answered a bit quickly. Reading back, I realize that I should have been a little more thorough in my answer…
Whenever a workflow process is triggered by new files, it takes a snapshot of all files in the monitored folder. It then processes that list and once it’s done, it goes back one more time to see if new files have come in while the processing was going on. It builds a new snapshot of all the files, processes them and the entire procedure is repeated until no more files are found.
Imagine that this process is not self-replicating: that means each file will be processed sequentially. In that case, it would be much more efficient to have separate processes monitoring their own folder because then you would process all folders in parallel. I think that’s a pretty clear cut argument in favor of having multiple processes.
But if the process is self-replicating (for argument’s sake, let’s say it is set to have up to 10 self-replicated instances and the folder contains 100 files), then after the snapshot has been taken, 10 instances of the process are created and each of them is handed 10 files. Which means the files are processed in parallel. Now what’s the difference between those 10 self-replicated processes, and 10 separate processes? Well… not much. As I stated in my initial answer, the self-replicated process might be very slightly slower than a standard process because it first has to be cloned before it can start processing its share of files. The cloning procedure takes milliseconds… at most.
That’s why I stated having multiple separate processes may fare marginally better, but not by much.
I personally prefer to use less processes and rely heavily on the self-replication feature. It makes managing and maintaining processes much easier. However, if using a single self-replicating process means that you have to add several conditions inside that process to account for the variations in processing different types of data files, then you might start seeing a more obvious difference in performance.
That’s because every condition takes milliseconds to resolve (and sometimes it can be several milliseconds, for instance if the condition needs to extract some content out of a PDF). For a single file, it won’t make much of a difference. But after a few thousand files, those extra operations add up.
In that case, you’re better off with separate processes.
Not the clear-cut answer you were expecting, I’m sure, but at least now you have more info on which to base your implementation.