Workflow execution is looping infinitely at a wait timer

389 views Asked by At

We have a workflow that executes a task to check for a record in DB when data is not found it waits for a minute and execute the task. Below is the excerpt from our workflow.

<intermediateCatchEvent id="BHTimer" name="Wait 1 Minute">
  <incoming>BHNotActive</incoming>
  <outgoing>IsTickOpen</outgoing>
  <timerEventDefinition>
    <timeDuration xsi:type="tFormalExpression">PT1M</timeDuration>
  </timerEventDefinition>
</intermediateCatchEvent>

We noticed that this task keeps looping infinitely even when the DB record is added. Strangely, the frequency at which the task is executed changes from 1 min to ~200ms causing millions of records accumulating in ACT_HI_ACTINST table. Below are the table data stats of one of many such processes in our system.

In a few seconds, the event has been executed thousands of times and continues forever creating millions of process entries for the same job under database tables “ACT_HI_ACTINST” & “ACT_RU_EXECUTION”.

Running below query returns millions of records:
1. select * from ACT_HI_ACTINST where PROC_INST_ID_ = 'f33c539a-dfe2-11e8-9d30-0050569941b2'; 2. select * from ACT_RU_EXECUTION where PROC_INST_ID_ = 'f33c539a-dfe2-11e8-9d30-0050569941b2';

Following are the statistics of activiti tables when we got the performance issues.

Table name : Number of records

ACT_RU_EXECUTION: 3435162 ACT_RU_TASK: 318122 ACT_RU_IDENTITYLINK: 251334 ACT_RU_VARIABLE: 265008


Table name : Number of records

ACT_HI_IDENTITYLINK: 2526867 ACT_HI_PROCINST: 54564894 ACT_HI_ACTINST: 28169298 ACT_HI_TASKINST: 4769590 ACT_HI_VARINST: 8711507

Some of these processes become orphan (processes have not ended when close was issued). Another thing we noticed is the exception message in act_ru_job table for such processes - "JobEntity [id=2786e249-dff6-11e8-a9c8-005056990bf2] was updated by another transaction concurrently" message from exception message column.

We have a purge job to remove data related to completed processes (processes that have end_time_ populated in the act_hi_procinst table, but these processes don't get deleted as they never end looping infinitely).

We have examined our workflow and we don't see any parallel execution paths, so we are not sure why this error could be occurring. One thing to note is that this is deployed on 2 node cluster environment, could it be possible that both nodes are picking up the process for execution at the same time.

Our questions are: 1. How does activiti make process execution cluster safe? is there any cluster specific config? 2. The workflows that we generated using a designer are flawed? Please have a look at the attached workflow snippet, diagram, and advice. Diagram image: https://i.stack.imgur.com/xMQWm.jpg. If someone needs complete workflow XML, I can attach that also. Avoided due to the word limit.

Workflows are generated using BPMN Designer. Activiti version : 5.17.0, Database: Oracle, Web server: Tomcat

This is causing serious performance issues in our production environment, any help in resolving this is much appreciated.

0

There are 0 answers