How can I check for Duplicate Files when using Workflow

Hi,

I am setting up Workflow for automatically pulling .XML files from an FTP site and rendering them through a template, job creation, output creation and datamapper. Is there a way to check if I am grabbing duplicate files? The same file might be uploaded twice to the same FTP site by mistake and we want to check this before rendering to PDF output files.

Let me know if I can provide any further information.

Thanks,

Hello TSled,

There are two ways that I can think of in which you could do this. The first is to save a copy of each incoming XML file into a specific folder, while previously checking whether it exists there already.

This can be accomplished with the File Size Condition task in the Process Logic , checking if the file size of that filename in the backup folder is higher than 0 bytes. If it is, then you simply do a “Delete” task in the branch (unless you want some sort of notification, but that’s up to you). If it’s false (aka, there is no file there), you can just use Send to Folder as an action to save the file in that location.

The disadvantage of this method is that you’re saving all the files in a folder and depending on the size of the file and the space on your disk, this might be an issue. The advantage is that it’s the easiest alternative to the other method…

The second way you could do this would be with a tracking database, which would mean you’d have to create such a database that saves, at the very least, the name of the file (it could also save data contained within the file if necessary). You would need a Database Action task that does a query on that database to see if there’s a result, with similar results with the method above (true branch does nothing, false branch writes to database and continues the process).

The disadvantage of the database method is that it’s a little more complex to implement (the database, writing the SQL query that adds to that database). The advantage is that it takes little space, not a whole lot of processor power, and if you want you could even track how many duplicates you have in the database.

Hope this helps,
~Evie