Extracted datamappers output (as csv) shoud be enriched by an external process, then sent back to the same or another workflow.
Reading a pdf, extracting some data, sorting, adding some data and then „stamping“ some of that new data into those documents.
Example:
Doc (pdf) consisting of several adresses stored in extracted fields.
External (manually created) geo-coordinates for each document-id and along with that a geo.map image (and/or just plain text)
Next process should be to insert that image (or some text) into the original pdf.
How can I reference the original metadata in a subsequent process?
Not sure I correctly understand your request, but it would seem like a fairly simple procedure:
Make sure that in your initial DataMapper configuration, you have already created two fields (e.g. “geo_coordinates” and “geo_map”. Those fields will be empty in all records since the original data doesn’t contain those values, but that’s OK since you will be adding the values through Workflow.
In Workflow, after the data is extracted, use a script to store the geo coordinates and the name of the image file into those two fields in the metadata (this script obviously depends on where/how that information is stored externally)
When executing the Create Content task, make sure to tick the “Update Records from metadata” option, which will ensure that the values you added to the metadata are stored back in the database’s original records.
This is a very high-level view of how to achieve what you want, but it should hopefully get you going.
XSLT transformation of temp xml file to csv, with id and those empty fields
Store that csv in a seperate mysql schema table (named after the original filename) (load data infile… extremely fast).
Export metadata and original file in temp folder
Offline process: enrich that csv
II. second workflow awaiting that “enriched” csv
Load csv updating the mysql table (I.4) using load data infile into temporary mysql table inner join / updating that mysql table (also incedibly fast).
fetching the original file and metadata (I.5), Folder capture and Metadata File Manager
iterating through the metadata fetching the values of the mysql table writing them to the empty metadata fields thisDoc.Fields.Add(“_vger_fld_myfield”, mysqlrecordset(“myfield”))
Create Print Content (Update records from metadata)
and there it fails, metadata is looking good, but
[0010] W3001 : Error while executing plugin: HTTP/1.1 500 There was an error running the content creation process caused by ApplicationException: No record found with ID 1809031 (SRV000022)
I don’t know where I scrambled the record IDs since I’m reading the original file back (II.2).
Any idea?
Ralf.
Addendum: it works with smaller datasets, but with 20k rows (input pdf with 180k pages) above error occurs.
Hard to say, but one thing to look at is not length of the job itself, but rather the duration of the entire process. You don’t mention whether these steps are performed immediately or over a period of hours or days.
Check that your cleanup service isn’t running in between processes as this could explain missing records. You can change the frequency of the cleanup process in the Server Preferences. The easiest way to test if the service is the source of the issue is to disable it for a while and see if all your jobs are processed properly. If they are, then it means the service runs too frequently, so you should adjust its schedule.