Datamapper error in Workflow, how to debug

I’ve a csv with about 4.000 rows, if I create the output via Designer everything runs fine.

If done via workflow in single steps (same template, datamapper, output and job creation setting) I’m getting a rollback error during the datamapper task.

If I split the same csv in smaller parts (about 500 records each) it works.

If I use the “all-in-one” task it works, too (don’t save records checked).

Hi RalfG,

It seems to me that the output you’re trying to create is taking up a lot of CPU/resources, and it might just be getting too heavy on your system. This is why splitting the job fixes the problem (you end up running multiple small jobs and resources are cleared in between each jobs, as opposed to one huge job), as well as using the “All-In-One” task, since it is more optimized and more efficient in terms of performance than creating the output in four steps.

Here’s a few tips to optimize your workflow:

  • Make sure you have decent hardware. The Recommended hardware is 4-cores, 16 gb ram and 20 gb of free disk space. If you don’t meet that, it may be time to get better hardware.
  • Rework some datamapper/scripting logic to avoid unnecessary tasks, loops, etc…
  • Remove unused images, try to reduce the size of images. Import static images in the document as opposed to loading them externally from a script.
  • If you are using multiple processes, or self-replicating processes in your Workflow, make sure that you’re not creating so many that you end up using too much ram or cpu. While having multi-threaded processes can speed up jobs in some cases, in others, if you create multiple threads when it’s not needed, or create too many of them, it can end up having the opposite effect.

These are some things you can look at to improve your solution. There exists more advanced options as well, such as controlling the number of engines that are started through the Preferences under Scheduling, or tweaking the weaver engine’s settings, but those are more complicated, and more risky changes that I wouldn’t recommend you attempt without first trying the simpler suggestions mentionned above. If you still need assistance in improving the solution, it might be best to open a Technical Support call, and have a technician look at your solution to see if anything can be improved.

Regards,
Raphaël Lalonde-Lefebvre

Hi Raphael,

it’s definetly not a hardware issue, system is running on a clustered system with high I/O, plenty of RAM and space (one can monitor that while processing), no self replication process, no others running. And if I assign 4 or 8 cores to that machine it doesn’t make any difference. Number of engines isn’t an issue either 'cause only one of 3 is used.

It’s always the datamapper (mergeengine) causing the trouble with complex jobs.

Anyway, thx for your answer, I’ll open a ticket for that.

…Remark:

just debugged the processes a bit: the bottleneck is that all processes are waiting for the remote (ms) sql-server (sql-profiler showing tremendous queries during running mergeengine or weaver processes).

As I understand it, MS SQL has a fairly low limit on the number of arguments it can handle in a request, and large data mappers can quickly overwhelm that limit.

mySQL (which comes with Connect) has a similar limit, but is much higher, allowing you to work with larger batches of data at a time. You might consider switching to mySQL instead.

AlbertsN,

thx, I’ll give mysql a try, even if I’ve never seen a scenario where mysql beats mssql.

Ralf.