Output Separations

TGREER · October 15, 2024, 10:18am

Two scenarios come up frequently when I need to use Output Separations.

I need to know before running if there will be duplicate files. It’s a mess when a large job fails 90% of the way through because of a duplicate filename.
I need to know when the output is completed.

For the first issue, it would be great if there were a “validation” step or mode that checked for dupes before starting the output.

For the second, I like the performance of starting a Separation output step that runs asynchronously, letting the rest of the Process run, but often there is a housekeeping step (for example, copying all of the output to an archive location, or zipping all of the output and FTPing it somewhere). Is there a REST call or some other signal that can be checked to see if the Separation step is completed or still running?

Phil · October 15, 2024, 7:42pm

For the first part, it really depends on how you’re naming each output file.
If it’s based on a field value, for instance, then you could check for duplicate field values immediately after the data mapping step (a simple Workflow script could take care of that).

If it’s based on a more elaborate condition, you’d probably have to first store the final output name as an individual field in the data model, and then use that simple script mentioned above.

For the second issue, there are REST calls that allow you to monitor the progress of any operation and its final result (e.g. Get Result of Operation). But in order to get that information, you must know the Operation ID, which is only available as the result of the REST calls that launched the operation.

So you’d have to perform the output step yourself, using REST calls. That’s not an overly difficult task, but it does require you to be familiar with the various requirements of the REST API (authentication, resource handling, operation sequence, etc.).

Note that even if you were to do this, you’d still have to account for potential latency: the operation may be deemed completed, but the OS might still be in the process of flushing the file to disk, which could prevent you from picking up that last file because it is still being written to disk. This is less likely to occur with the speed of today’s SSDs and NVMEs, but it’s still a possibility. You’d have to introduce some artificial delay into your housekeeping step to account for this possibility.