Workflow and Connect Performance Guide

I really need a guide or utility for tuning Workflow and Connect for best possible performance.

How many of each type of engine should I run? What memory settings should I use for the various engines? How does the “self-replicating” setting in Workflow really work, which type of processes should get this setting, and what should the “threshold” value be?

If I have the Weaver Engine setting too low, I get “glyph” errors in Output Creation. Too high and I get “IPC connection” errors, both random.

Is there a utility I can run on my server that will profile my server and make suggested settings?

I suggest you to have a look at this link:

You could also find a description of self replicating processes in the help file: https://help.objectiflune.com/EN/pres-workflow-user-guide/2020.2/Default.html#Workflow/Interface/Process_Properties.html#toc-0
It replicate the process in order to process several files at once

Those mainly describe Connect Design optimization. I want to understand the Connect Server and Engine performance settings at more than an intuitive or gut level. There has to be settings that work best depending on CPU numbers, memory amount, and so on.

I also understand what multithreading is in Workflow, but how the “threshold” works is a mystery and how the thread instances relate to the Connect Engine settings is also unclear.

There has to be settings that work best depending on CPU numbers, memory amount, and so on
Unfortunately, there aren’t because one of the greatest impacts on performance comes form the type of jobs you are running. Lots of small jobs require different settings than a smaller number of large jobs. And of course, a mix of large and small jobs require further fine tuning.

Starting with Connect 2019.2, the Connect Server Configuration application makes it easier to properly configure those settings by selecting an appropriate preset. However, if you still want to customize the settings, here are a few rules of thumb:

Engine memory

  • DataMapper: engine memory should only be increased when dealing with large individual records. The size of the entire job is irrelevant. Each job is processed by a single engine but we have found that having more than 2 DM engines running concurrently can actually have an adverse effect on the overall performance as it puts an undue load on the database.
  • Content creation: engine memory should be increased when dealing with large individual documents. The total number of documents is usually not relevant. Each individual document is always handled by a single engine, but a batch of documents can be handled by multiple engines running concurrently. There is no limit to the number of Content Creation engines you can run simultaneously, other than the physical limitations of the underlying hardware.
  • Output creation: these engines are used for paginated output as well as to provide conversion services to the rest of the system. The amount of memory to reserve for each engine depends on the overall size of the job and the complexity of the operations requested (for instance, additional content that relies on pre-processing the entire job will require more memory). In most environments, having 2 or 3 Output creation engines is more than sufficient to handle a steady load of incoming jobs.

Hardware

  • You should always consider that each engine uses one CPU thread (that is not strictly true, but we’re generalizing, here). So if you have an 8-Core CPU (with 16 threads), you could for instance have 2 DM + 6 CC + 2 OC engines, which would use up 10 of the 16 threads. You want to keep a certain number of cores available for the rest of the system (Workflow, MySQL, Windows, etc.)
  • Make sure you have enough RAM for each engine to run. I usually recommend 2GB of RAM per CPU thread (in our example above, we would therefore have 32GB of RAM). That doesn’t mean to say you should assign 2GB of RAM per engine: in most instances, all engines work efficiently with less than 1GB each. But the rest of the system requires memory as well and if your system is under a heavy load, you want to make sure your don’t run out of RAM because then the system will start swapping memory to disk, which is extremely costly performance-wise.

Self replicating processes
The most likely process candidates for self-replication are high-availability, low processing flows. For instance, web-based processes usually require the system to respond very quickly, so they are prime candidates for self-replication. Another example is processes that intercept print queues (or files) and simply route them to different outputs with little or no additional processing.

Processes that handle very large jobs are least likely to benefit from self-replication because each process might hog a number of resources from the system (disk, RAM, DB access) and having several processes fighting for those system rsources concurrently can had an adverse effect on performance.

By default, Workflow allows a maximum of 50 jobs to run concurrently and each self-replicating process can then use a certain percentage of that maxiumum. Obviously, if you have 10 processes that are set to use up to 20% of that maximum, then your processes will be fighting amongst themselves for the next available replicated thread. Note that with modern hardware (let’s stick with our 16vCore/32GB RAM system), you can easily double and even triple that max. On my personal system, I have set my max to 200 concurrent jobs, and that works fine. But that’s because my usage is atypical: I run a lot of performance tests, but I don’t need to prioritize one process over another.

Final note on self-replication: if you are heavily using the PlanetPress Alambic module to generate/manipulate PDFs directly from Workflow, then you should be careful when self-replicating those processes because there is a limit to the number of Alambic engines that can run concurrently. Having 10 cloned processes fighting for 4 instances of Alambic will create a bottleneck.

Conclusion
Remember that all of the above are general rules of thumb. Many other users (and in fact, many of my own colleagues!) would probably tell you that from their own experience, the values I quoted are not the most efficient. But hey, gotta start somewhere!

Hope that helps a bit.

2 Likes

Adding on Phil’s very thorough explanation, Workflow is still a 32-bit app (we’re working on that - can’t say more), which means that it can address at most 1.8 GB of memory in total. So the number of replicated processes should be balanced with processors and al., but also with the size of the data they are going to be processing concurrently.

Going back to those 200 replicated processes Phil mentioned… If they are small jobs, or Connect jobs for which you are using only the IDs in metadata, it should be fine. But downloading an entire DataSet in the metadata for a 100 MB XML file, this might get problematic very quickly.

This can be sometime hard to gauge for the same reasons that Phil stated at the beginning of his comment. Fortunately Workflow is usually pretty good at managing its memory efficiently. The vast majority of its plugins will stream the data rather than loading it entirely in memory. For the metadata, the size of the file is 1:1 with its size in memory. The biggest memory hog is XML which, for reasons out of scope of this discussion (read up on “DOM memory size” if you feel adventurous), can take 10x its file size in memory when being read.

we are 3 years later now and workflow is still a 32bit application. I’ve had 5 out of memory issues while using workflow today and this is by far the biggest issue with PReS connect but I’m yet to see any solution for it. I honestly can’t believe how long this issue has been known about but it’s still not been resolved, I’m honestly starting to think I need to start looking at other options as I don’t think this pretty stupid issue will be resolved. It’s not like 64bit is a new concept is it… only been well over 10 years now!

The problem with Workflow is that its development environment does not support 64bits. Which means it would require us to re-write it pretty much from scratch in a 64bit environment, something that we decided against because the costs would be prohibitive.

However, we have been working hard on Automate, a different type of automation application that’s based on Node-RED, which is 64bits and unicode-compliant. It is currently in Tech Preview mode, which means you can install it and play around with it. We have a forum dedicated to it as well.

1 Like

Hi Phil

But when is the planned release of this? I understand it’s been talked about has this product for over 3 years at least?

James

It is available right now (in fact, we published an update this morning). It is still in Tech Preview mode, we are aiming to officially release it later this year.

Hi Phil,

Are you able to elaborate more on this for me?

What you are saying here is one document/pdf being generated is only handled by one engine?

Take a Process for example that’s triggered by Folder Capture (1 file) and goes through data mapper, content creation, job preset and output preset.

^ This process will only use 1 engine at a time? I was under the impression that 1 process can allocate multiple engines to make the process run faster.

Otherwise does it mean multiple engines are based on multiple processes that are running concurrently?

Thank you in advanced.
Kind regards,
Ernani

In my explanation, document means any content generated from a single data record. A Job is a collection of Documents. So if you are processing an XML job file from which you extract 100 records, then content creation will be creating 100 documents for that single job.

Of course, each record may vary greatly in size: some records may generate a single page of content (like a collection letter, for instance) while others may generate loads of pages (a Telecomm invoice, for instance). That’s why it’s important to remember that each individual record in the job is processed by a single engine.

Take a Process for example that’s triggered by Folder Capture (1 file) and goes through data mapper, content creation, job preset and output preset.
^ This process will only use 1 engine at a time?

No. The DataMapper will use one engine to process the file and will generate, for instance 1000 records. Then, those records are distributed across several Merge Engines. So let’s say you have 5 Merge Engines, that means each engine will be generating content for roughly 200 records (this may vary depending on a number of conditions, but you get the idea). Then the Job Preset is processed by the Server Engine and the Output Preset is handled by one of the Output Engines. Having several Output Engines allows you to more efficiently process multiple Output Presets simultaneously.

Then, those records are distributed across several Merge Engines.

I see the job then uses multiple merge engines to process batch of records from one datamapper engine based on the server configuration set.

1 Process as for your example.

Job file - XML
Datamapper = one DM engine > 1000 records
Create content = 5(based on config) merge engines handling 200 records
Job preset = one server engine
Output preset = one weaver engine.

The notion of multiple engines running a job relates to one process. Although the screenshot(highlighted) below still a bit confusing.

Im also going to put reference links here as understanding this is not as straightforward at least for me.
https://help.uplandsoftware.com/objectiflune/en/olconnect/2023.2/General/Connect_Architecture.html
https://help.uplandsoftware.com/objectiflune/en/olconnect/2023.2/ServerConfig/Parallel_Processing_Preferences.html

Thanks so much!

The option you highlighted is documented on this help page. As explained in the help, it should only be changed on Test or low memory systems.

As for the mapping of the various engines, I admit it’s not easy to wrap your head around it. We have tried to make it as simple as we could by offering a variety of presets, but as soon as you start customizing those presets, then you have to have a good understanding of the parameters.