We have added a new schema to the MySQL database installed with Connect to store PDI details that are created with the PlanetPress Image plugin.
The purpose of the PlanetPress Search database is to store PDF invoices that are not older than 3 months.
There is a process which inserts details to the Search database at every production run and there is another process which deletes files that are older than 3 months from the archive folders every hour.
When we run a search on the database we often find references to invoices which no longer exist in the archive folders and as such can’t be viewed in PlanetPress Search.
I discovered I have to manually rebuild the Search database to synchronise records in the database with actual files in the archive folders. This operation sometimes take up to 1 hour depending on the number of files\records.
We want to be able to perform the rebuilding of the database once\twice a day at specific times during quieter hours to make sure the database is always almost up to date when a user performs a search, saving them the time they would have spent to rebuild the database before hand. Is there an option for this in PlanetPress Search? If not, how can this be achieved via a JavaScript that could be automated in workflow?
I suppose that with “I discovered I have to manually rebuild the Search database to synchronise records in the database with actual files in the archive folders” you’re referring to the “Rebuild Database” option available in the PlanetPress Search application (Options > Rebuild Database…)? Because as far as I know is this the only option to rebuild the contents of a existing database.
I have to say that I’ve never looked into the content of the database but my guess is that it would simply need a Workflow process that look into the Search database, extract a list of all PDF.
Then with a Folder Listing plugin, get a list of all PDF physically existing in the folder(s).
Now that you have two list, compare them and delete record from the Search database that no longer have a PDF counterpart on disk.
@hamelj
I don’t think that is the way it works. The indexes in the Search database don’t seem to be concerned with the actual PDF files rather with the information inside the PDI files. In addition, As fas as I can see PP Search seems to insert one table per document in the database and you also have vraious tables: table_document, table_path, table_index, table_pdf…etc
So I am unsure how to approach this.
So simply doing a folder listing of PDf to remove their corresponding table in the database will corrupt the database.
I need a clean and proper way to frequently rebuilding the content of the search database so that it is in sync with the search folders.
Is there any command line option to run the rebuild\refresh minimised as a background task? At the moment the PP Search GUI is started when I run it from the command line.
As a feature request, I would suggest a way to index into PP Search from the Connect Output Creation Wizard so that it writes these search indexes directly into the Connect SQL database. A checkbox is probably all we need in the GUI either on the Job Creation Wizard when adding metadata or in the Output Creation Wizard when creating the PDF.
The user can then specify within the Connect Server Configuration how long they want to preserve the PP Search records for andhow often they want to perform a refresh\ rebuild\cleanup on the PP Search records if these actions are still required.