I am trying to use the Data Repository for temporary storage to do a bulk update into an SQL table once all records are complete.
The problem I am having is that when using the Push to Repository plugin or a simple lookup command in a Set Job Infos & Variables plugin in a threaded process, I get the error:
Error while executing plugin: unable to close due to unfinalized statements or unfinished backups.
Knowing that workflow plugins sometimes aren’t optimized for threading, I wrote a script which used the SetValue function via the API however now there is a locking problem and I receive the error:
Database table is locked: Document.
Document clearly being the name of my group or table.
Please advise if there is a way to issue multiple simultaneous update commands without breaking the plugin or locking the SQLite files.
>> Knowing that workflow plugins sometimes aren’t optimized for threading
It is true however that some 3rd party plugins may not have been optimized for multi-threading, but these plugins aren’t shipped with the software and from what you’re saying, it doesn’t apply here.
Quite the contrary, all plugins that do ship with the software are optimized for multi-threading. If one of them is not, it means it’s a bug that got past our development and QA processes. That bug should be reported to our Support team who will ask you to provide sample resources so they can document it properly for R&D to fix.
I can’t offer you any help without having a look at your processes and scripts, so please open a call with our Support team to get this resolved as quickly as possible.
Knowing that workflow plugins sometimes aren’t optimized for threading
…this is simply what we are told by Support.
The last time I was here, I was asked to turn the threads down because the software can’t handle threads beyond the number of logical CPUs on the hardware device. Therefore, I was hoping there would be an arbitrary metric for this eventuality also.
I will log a support call with UK Support immediately.
I think your last communication about the threads was just a misunderstanding. Workflow can handle tens, even hundreds of concurrent threads. However, if some of your processes are using PDF generation from within Workflow (i.e. this doesn’t apply to PDF’s created with the Connect Output), then it is recommended that you keep the number of such processes (the ones that generate PDF’s) roughly equivalent to the number of CPU Cores your machine has because each PDF engine will pretty much use one entire core to generate the PDF’s. Consequently, if all of your cores are busy generating PDF’s, the rest of the system might become sluggish.
But other than that, you should instead use parallel processes as much as you can!
This is an interesting theory - I have found in my experience that plugins fail when used in a highly threaded environment such as Send to Folder and Metadata File Management with errors such as:
Metadata file not found.
A call to an OS function failed.
Error while copying metadata file. : Error code 32: The process cannot access the file because it is being used by another process.
I also experience a Send to Folder failure where it says it cannot find the path specified when we know that Send to Folder should lay down a path that is not currently present. This is in an environment where multiple threads before this complete successfully but then a few fail. It is almost as if the commands are being queued but the Send to Folder plugin continues to try to place the file before the Windows API has laid down the path.
More will become clear from the support ticket but PZ comes across this quite a lot in high-capacity instances. It is something which has been raised time and time again.
Hopefully you can point out where we are going wrong…
The Send To Folder issue is usually due to a timing problem within Windows itself: the plugin requests the creation of a folder (if it doesn’t already exist) before storing the file in it.
However, if there is a lot of disk i/o at that moment, the folder creation may take several milliseconds more than it should (and sometimes up to a second or two). The problem as we understand it is that while the folder has been created by Windows, it has yet to be written to disk for that short period of time, but it still appears as being available so other processes may attempt to write to it… and will fail.
Unfortunately, these kinds of issues can only be addressed by ensuring, as much as possible, that folders are created prior to the processes being run. In addition, installing a SSD would greatly alleviate (and possibly eliminate completely) disk latency issues.
As for the Call to an OS function failed, we have found that this usually occurs while a script is running on the system. Mind you, it’s not the script that actually raises the error, but it seems to be causing it. We believe that the Microsoft Scripting environment, which Workflow uses, may hog a number of resources, preventing other processes to complete their tasks in heavily threaded environments. You’d have to check your logs to see if that theory holds true.
Finally, the metadata issues are strange, since it is impossible for any two tasks to be using the same metadata file simultaneously: each process creates its own temporary copy that gets deleted along with the data file at the end of the process. This would therefore point to an external cause, If you run into those issues on a regular basis, I would definitely look at disk latency problems, very possibly caused by over-aggressive anti-virus policies that temporarily lock files down while they are scanned. Again, on heavily-threaded systems, an AV application may take much longer than usual to scan any single file, which will eventually cause trouble in Workflow.
To fix this, Workflow’s temporary work folders should be excluded from any AV scan.