I am coming back at the issue, unfortunately even after the fake request in the morning, the issue still persists. So the clean up service does indeed affect performance for the following request.
Data shows that the fake request did indeed make a change, however we are still experiencing random lag spikes. Is it on the OLConnect side or the VM side? Hard to tell. Telemetry collected with Grafana exporters shows low system usage so CPU, RAM, or Disk shouldn’t be the issue, but I would like to better pinpoint this.
What I have observed by adding timers to each olconnect step is that only the datamining and content creation step takes more time.
Below an example of two runs at different times of the same payload, template, dm etc
SERVIZIO | RUN 1 | RUN 2
-----------------------|----------|----------
FILESTORE UPLOAD | 82 | 83
DATAMINING | 2518 | 722
CONTENT CREATION | 5426 | 1621
JOB CREATION | 324 | 319
OUTPUT CREATION | 4821 | 1123
The long awaited new structured logging mentioned in another thread seems to be not released yet or I didn’t catch the benefits and would gladly like some guidance on how to ingest the logs.
I didn’t open a official ticket but maybe it could streamline the debugging for the staff?