Metadata Level Creation Performance Question

ssime · November 8, 2019, 4:53pm

I am working with a PDF data file which contains about 65000 records.
I have created and sorted the metadata document level without issues.
Now I need to Group the metadata when the City field value changes.

The metadata Goup Level creation takes over an hour to complete and the Workflow memory rises to about 1.6GB.
Then follwoing that, there is a Metadata Field Management task which is required to add the City field at the Group Level.
In the Metadata Fields management tasks there is also a requirement to add a field at the Group level which is the SUM of another field at each document level within each group. This part works fine.

This steps also takes over an hour to complete and keep memory usage at about 1.6GB. So I am worried the process will fall over in production when running with much greater amount of records. Unfortunately, I can’t split the data as it needs to be sorted grouped and then barccoded for the inserter.

Are there any ways to optimise the Metadata Group creation whilst allowing me to add and sum the required fields?

I am told a metadata API exists which might speed things up but quite frankly I have no experience with the API.

Perhaps if someone knows of an existing example script which groups metadata when a Field Value Changes, I could use it as a starting point?

Thank you.

Phil · November 8, 2019, 7:35pm

Wouldn’t it be simpler (and much more efficient) to use a Job Preset and do all of that in the Job Creation task instead? You wouldn’t have to mess around with the metadata (actually, you’d probably no longer need the metadata at all).

ssime · November 8, 2019, 10:54pm

Can I use job presets without a Connect template?
I think that would be a tremendous feature but there seems to be a disconnect between the workflow and job preset metadata as I have noticed.

I am not currently adding content to the PDFs In Connect, so I don’t need a datamapper nor a template.
I’m just sorting, grouping by city. I also need to create a bdf file from the sorted PDF which is essentially a text file containing some information on each record such as Customer Number, sheet count per group, page count per document and total spend per City or group. The bdf file is required for our barcoding application.

The sorted PDF will then be fed to an external application for barcoding as it’s capable of applying a data matrix to 25000 pages per min.

The average job file is about 300 000 pages. The all- in-one in Connect also takes significantly longer, raises memory errors even though my system has 24GB ram and I have given the weaver engine 10GB and 2Gb to each of the 4 merge engines.

The workflow metadata seems more stable and uses less memory but it is just very slow. I would like to script the metadata group creation if I can find out how to use the metadata api.

ssime · November 10, 2019, 12:40am

I used a Connect datamapper, template and job preset as suggested but I am having an issue grouping the job by the total weight for each city.

I need to further group all cities whose individual weight is under 10kg so that if city A weght is 8kg, city B weight is 6kg and city V weight is 3kg, they are all combined into one batch.
If the total weight of a city invoices is greater than 10kg, this city invoices need to be output as a batch of its own.

It’s possible to group by page count or sheet count only in a job preset, but there is no way to sum the values of a field such as the individual weight of each invoice and then use this sum to create different groups (under 10kg, between 10-20kg and over 20 kg)

I recall this was the main reason I opted using the workflow metadata only.
So how can I optimise the metadata fields management task in Workflow? I think this is mainly where the bottleneck occurs.

ssime · November 10, 2019, 1:52pm

I found the documentation for the planetpress metadata api. This has enabled me to makse some progress. I replaced the metadata fields management by a script and this task went from taking 1h28min to 0.028seconds!!! It looks like the metadata fields management unnecessarily loops through all the records when I only needed to add fields to the first group with my condition. There is no way break out of the loop within the plugin and I think it’s a missing feature.

Speaking of missing features, I also feel the metadata related plugins have not evolved to allow users to manipulate the Connect metadata. I can’t get a simple thing such as the page count in a document. The page count always comes as 1 for all documents even after the Create Content or the Create Job step. These plugins should be allowed to query the Connect database in order to load the correct metadata. My guess is that one should be using the Connect REST API for this? Why? It just makes it so difficult to use!

All I can say is that as a user, overall I am not getting the best experience trying to exploit Connect Designer features and metadata in Workflow.

Sharne · November 11, 2019, 7:54am

I totally agree with all your points regarding Connect/Workflow metadata and definitely agree with your findings about the metadata plugins. They are the slowest in the world. I followed this how to YEARS ago and the process ran for around 15+ hours. OL ended up supplying a script. Those plugins need some looking at and Workflow metadata needs to be reworked but the latter seems to be a big task for OL.

Regards,
S

ssime · November 11, 2019, 8:38am

Sharne,
Thank you for pointing out the link. I didn’t know the guide existed and I was actually trying to add a cover page which includes the number of sets(cities), the weight of each city, the number of pages and sheets per set, …ect

I actually approached the task in the same way as this guide (without even knowing it existed!).
My issue was with this condition: (GetMeta(SelectedIndexInJob[0], 11, Job.Group[?].Document[?]) (#)Equal 0)
I believe the plugin unnecessarily loops through all the Document and Group nodes in the metadata to evaluate the condition.
I wrote a simple script which adds metadata fields directly at the first document metadata node of the first group only without any condition or loop and this shaved a huge amount of processing time. Now my process went from processing in 11 hours to 1h10min. I am still looking for ways to optimise this further.

As for the metadata basic properties such as page and sheet count or page properties and finishing options I find it unacceptable in this day and age for one to write complex REST API scripts to retrieve these values. They should be accessible in the metadata. I hope OL are working to release new metadata plugins which are compatible with Connect.

I have said it before, as a user I would like to spend more time designing ad getting jobs out of the door asap instead of looking for ways to work around limitations and bugs. At the moment, sadly the latter is mostly true for me.

Sharne · November 11, 2019, 10:29am

ssime,

I needed a splitter page added when a postal code changed values. Two things missing in Connect was a splitter page with variable data (which I was told was coming in an update - status NO), the On Change option from PlanetPress Suite 7 (which I was told was coming in an update - status NO).

I also needed to count how many of the same postal code there were as the statements ran in that sequence. After messing around with the data mapper I figured out a way to detect a On Change (with a boolean value), *count the postal codes and display the previous postal code counted. I then, based on the boolean On Change field, enabled/disabled a Splitter Section by making it Conditional.

I need to edit this post from a while ago as it is now outdated with regards to counts IIRC and I have made it better.

But yes, I agree, the amount of workarounds need to be addressed.

Regards,
S

*Disclaimer - my new count method does have a small issue that I have yet to find a solution for. Perhaps posting it would allow others to apply a fix for it.

jbeal84 · December 19, 2021, 10:48am

Hi Sharne

I know this was some time ago but do you have a copy of that script as I’m trying to get this to work and having problems with just a small file and I’d find it much easier to follow\use\edit a script

James

Sharne · February 15, 2022, 10:18am

Hi jbeal84,

Been a bit since I visited here. Here is the script I mentioned a few posts up.

Set MyMeta = CreateObject("MetadataLib.MetaFile")
MyMeta.LoadFromFile Watch.GetMetadataFileName

Set MyMetaOut = CreateObject("MetadataLib.MetaFile")
Set MyGroup = MyMeta.Job.Group(0)
'currPCode = "xxx"

myDataFile = Watch.ExpandString("%O")
set myNewGrp = nothing
for each MyDoc in MyGroup
  isNewGroup = (currPcode <> MyDoc.FieldByName("_vger_fld_PCode"))
  if isNewGroup then
    currPCode = MyDoc.Fields.ItemByName("_vger_fld_PCode")
    if not (myNewGrp is nothing) then
      setDocumentCounts(MyNewGrp)
    end if
    set MyNewGrp = MyMetaOut.Job.Add(MyMetaOut.Job.Count)
  end if
  MyDoc.Copy
  Set MyNewDoc = MyNewGrp.Paste
  MyNewDoc.Fields.Add "_vger_fld_Splitter_PageRequired",lcase(isNewGroup)
  MyNewDoc.Fields.Add "_vger_fld_Splitter_PCode", MyNewDoc.FieldByName("_vger_fld_PCode")
  MyNewDoc.Fields.Add "_vger_fld_Data_Name", myDataFile
  'MyNewDoc.Fields.Add "_vger_fld_Splitter_PCode_Total", MyNewDoc.FieldByName("_vger_fld_Splitter_PCode_Total")
  'MyNewDoc.Fields.Add "_vger_fld_HidePostImages", true
next

set MyNewDoc = nothing
set MyNewGrp = nothing
set myMeta = nothing
MyMetaOut.SaveToFile Watch.GetMetadataFileName

sub setDocumentCounts(Group)
  for each Doc in Group
    Doc.Fields.Add "_vger_fld_Splitter_PCode_Total", Group.SelectedCount
  next
end sub

jbeal84 · February 16, 2022, 9:14am

cheers thanks for that

edanting · September 26, 2024, 2:47am

Hi,

Where can do I find metadata API scripts documentations?

Cant seem to find them. Anyways , is it possible to write the metadata by script rather than outputting it through the datamapper?

We keep getting memory issues with big jobs - and it seems to be coming from datamapper whilst outputting the metadata.

(Btw we need the metadata file to generate reports).

Phil · September 26, 2024, 4:50am

Metadata API