Gen PDF --> Metadata--> Index--> DB?

Hello people how are you?
I have as a process input an xml with many invoices, some that are 2 pages, this produces a single pdf with all the invoices.
I need to generate a csv file of index to save to the database, these indices index:
invoice_number ()
number_of_pages (number of pages that the invoice has)
init_page (this is the position of the first invoice page in the pdf container)

This process was done in the PP Suit 7.6 version with this scheme:
Create metadata
Creation metadata level (defined the limits of the document (invoices))
Management of metadata fields
(added the invoice number field at the document level) → custom field
(added the initial page field at the document level → Metadata Document_PageIndex ())
(added the number of pages field at the document level → Metadata Document_PageCount ())
Metadata to pdi
(and convert xml to csv)
Database action
bulk insert csv with indexs.

with Connect I can’t replicate this scheme …
can you help me?

Thank you so much

At this time, getting this information from Connect would require using REST calls after the Create Print Content step.

There’s an example here of how to get the number of pages, for instance.

If you’re up for it, you can find more information on REST calls available here.

Otherwise, you can always use the metadata plugins, though you won’t have the benefit of the pp7 template to help. So you’d be setting all your rules for level creation and field extraction within the workflow.

The general flow would have Connect create a PDF, then pass it into a default Create Metadata task, next a Metadata Level Creation, and finally fields extraction.

If necessary, you can always hide extractable text in your Connect output to help locate specific information. For instance, you could put some white text in the margins to help you locate the start of each document in the Workflow. If the invoice numbers are inconsistently placed on the page, those can also be hidden in the margins to give you a cleaner place to extract them from.

Once you have the metadata, I’d probably just convert that directly to CSV using a script to read the metadata and output a file. You can find more on the Metadata API here.