It’s very rare that the Workflow processes are themselves the bottleneck: after all, Workflow simply executes a sequence of commands that are (mostly) handled by other applications/devices. But I can’t speak for your specific case because I don’t know what your time-consuming scripts do.
For instance, if you have a script that makes REST calls to a server in order to obtain additional data that must be embedded in your original data, adding more CPU’s to your system will not have any impact on the network latency that the REST calls introduce into the process.
Or if your scripts are updating a database with large queries, increasing the number of parallel processes may actually end up slowing down the entire solution because the Database Engine may be overwhelmed by all the simultaneous requests.
If your processes are writing huge files to disk, then the same is true: you may pound your hard drive with so much data that it has trouble keeping up with a high volume of I/O requests. I’m sure you have experienced this kind of stuff with your anti-virus software at some point: your entire machine seems to slow down for a few minutes, and it’s mostly because of disk activity.
In the case of Workflow, which uses and generates a large amount of files, investing in high-speed storage (like SSDs or even better, NVMe drives) is likely to have a lot more impact that increasing the number of processes. In the case of the Merge Engines, which require a fair amount of computing power to merge data onto templates, more processors mean you can run more merge engines concurrently, thereby providing better throughput… up to a limit, because at some point it’s the database engine that may start to feel starved for resources.
Self-replicating processes in Workflow are best suited for short, on-demand processes (like delivering web pages or 1-page print jobs, for instance). For large jobs that generate thousands of PDF pages, then you should not go crazy on the number of engines that run in parallel because they will all be competing for system resources, which can ultimately have the opposite effect to what you were expecting.
Sorry to speak in such broad, generic terms, but each situation is different so it’s almost impossible to give you specific pointers unless I were to analyze your entire solution’s architecture.