Looking for some advice on how to tackle this in Pres Connect.
I’m trying to build a web front end to set all the configuration parameters and upload a data file to then feed into a mail barcoding process.
I’ve got a workflow working using a NodeJS Server Input accepting post data from a html form.
It’s working to a point but it’s timing out on larger data files (60k+ records) as the time it can take to run these though our barcoding process can be up to 20 minutes. The process will still finish but the web page times out and errors.
# This page isn’t working sc-web03 didn’t send any data. ERR_EMPTY_RESPONSE
I’m just wonder if this is best approach to this or whether I should just be handing off the data file and a configuration file to folder capture to run the process.
I would like to provide the front end user some feedback onto when the job is complete via the web front end if possible.
Your approach is OK for jobs that don’t require more than a few seconds to process. For longer jobs, you need something a bit more sophisticated.
Here’s one possible approach, using 4 processes (I prefer having several smaller processes rather than a few larger ones):
Front end process has a NodeJS input and simply delivers the Web front end that will allow the user to upload a job. This process could use authentication if you want.
Upload process has a NodeJS input that receives the job from the front end. It generates a unique name for the file, with a .JOB extension, and it pushes that name in a new Group in the Workflow Repository. That Group contains a Key for the name of the file and a Key for its status. At first, that status is set to “Submitted”. The process then simply returns a basic HTML response ( e.g. “Job submitted successfully. ID : 23785623658.JOB. Click here to monitor status” )
Monitoring process has a NodeJS input (that’s the process the response above points to). The process gets the list of jobs from the repository and simply displays their name and status. The HTML page generated by the process and returned to the user should be set to refresh every few seconds so that the status for all jobs is automatically updated.
Main process has a Folder Capture input that captures files with a .JOB extension from the storage folder. It uses the file’s name to update the Repository status for that file to “processing…”. It then does its barcoding thing and when it’s done, it sets the Repository status for that file to “Completed”.
You would need at some point to add a 5th process that runs once a day and that cleans up all “Completed” statuses from the repository. Obviously, you could run it more frequently or less.
This kind of architecture allows for growth, and also allows you to eventually implement interesting features (you could store the time it took for each job to process, or any other statistics you would find useful in a monitoring page). You could eventually beef up that monitoring page so that it allows you prioritize or remove jobs that have not yet been processed, etc.
Thanks for this info. Would the initially form need to be served up a nodeJS input our could it be a web page that just submits to the upload process. This is how I have initially set this is up with the form on our intranet site posting to the nodejs input. Could you see any issues with this setup?
Would the above approach you outline allow for multiple jobs to process at the same time or would they queue and process sequentially?
of course you can send the initially form from a static web page. You don`t have to serve that frontend through a workflow process.
Phils approach also supports processing of multiple jobs as long as you pay attention to the unique name for each incoming job. Just set your “Upload process” to “self replicating” in the process properties. In that case your process will be “duplicated” for each incoming job, so you can handle mutliple requests parallel. How much parallel processes will be handled depends on your hardware and the settings in your process properties (Max percentage of threading). You should also set the “Polling intervall” in the process properties to “0”.
In that case you can receive multiple incoming requests, store the incoming files in a folder and also store the job informations in a repository.
To process them parallel you also have to set your “Main process” to “self replicating” but you have to pay attention to the performance etc.