Transactions

The tSM Process Engine is designed for distributed, microservice-based deployments. It ensures that each part of a process either completes successfully or rolls back to a known stable state (a wait state). This transaction-oriented behavior guarantees data consistency, reliability, and clean handling of exceptions.

However, in distributed environments, multiple challenges arise — such as dealing with non-transactional calls, preventing duplicate reprocessing, and managing jobs with job priorities — all of which must be carefully addressed to maintain a robust system.

Wait States

A wait state is any point where the engine persists the current process context to the database and commits the transaction. Common wait states include:

User Tasks (manual user interaction)
Receive Tasks (waiting for a message or signal)
Timer Events (waiting until a timer expires)
Message/Signal Events (waiting for an external trigger)
External Tasks (handed off to a microservice or worker process)

Once a wait state is reached, the process engine:

Saves the execution context (all process variables, states, tokens) and business object changes (e.g. Order status, characteristics) to the database.
Commits the transaction so that the work done so far is persistent.
Pauses until the next event or message triggers the continuation.

Practical note: User tasks, receive tasks, and other wait states automatically handle transaction commits. There is no extra configuration needed to create a boundary here.

Transaction Boundaries

A transaction boundary occurs between two wait states. The engine executes BPMN steps (service tasks, gateways, script tasks, etc.) in one transaction. If an error occurs in that scope and is not handled by an error boundary event, the engine rolls back to the last stable checkpoint.

Non-Transactional Calls

Microservices often rely on REST or external calls that are not part of the engine’s transaction. Important points:

Failed external calls can trigger a rollback in the engine’s transaction, but partial updates in another system will not revert automatically.
This mismatch may cause inconsistencies and can require compensation or retries.

Asynchronous Continuations

Complex or lengthy operations are split into smaller asynchronous steps. By adding an asyncBefore or asyncAfter, each portion is committed independently:

<bpmn:serviceTask id="UpdateBillingSystem"
                  name="Update Billing"
                  camunda:asyncAfter="true"
                  camunda:expression="${@billingSystem.updateOrder(#order)}"/>

asyncBefore commits right before the task begins.
asyncAfter commits right after the task ends.

Use asynchronous calls for:

Long-running flows (avoid locking the database).
REST service calls where you don’t want to undo prior successful tasks if the call fails.

Always use asyncAfter after a non-transactional call, to prevent an exception from losing transactional consistency.

Rollback on Exception

If an exception arises in a transaction scope:

Changes made in that scope are discarded.
The process reverts to the most recent committed wait state.
Control returns to the caller or job executor.

Duplicate Reprocessing

When a rollback happens, external calls within the rolled-back scope may need re-invocation. For non-idempotent operations (like “charge credit card”), re-calling can cause duplicates unless you build safeguards:

Idempotent keys or unique tokens
Separate async boundary, so the user’s completed step is committed before the external call

Transaction Integration

Each API action (e.g., starting a process, completing a task) runs in the calling service thread. When the process hits a wait state:

The transaction commits.
The thread is released.
The engine waits for the next trigger.

For messaging (like Kafka), messages are published only after the main database transaction completes. If the transaction fails, no message is sent (similar to a Change-Data-Capture pattern).

Practical summary:

Use asynchronous communication wherever possible.
When a Kafka message is created, it is queued and dispatched only after the transaction commits.
For complex logic or long processing, external tasks are recommended (they run in a separate transaction and microservice).
asyncAfter is encouraged after a non-transactional call.
If you need validations or immediate results, keep it in one transaction (no async).
For longer flows, use business transaction logic.

Concurrency Issues

When multiple transactions interact with the same process, concurrency problems can arise. Examples include:

A user completes a task at the same time a message arrives.
A timer or async continuation triggers while another step is executing.
Parallel gateways or multi-instance tasks require synchronization.

Optimistic Locking

The engine uses a revision column in the database to detect concurrent modifications. If two updates collide, one fails with an OptimisticLockingException. The engine retries internal jobs that fail this way.

However, if you call an external service within the same scope, it might be re-called after retry, leading to duplicate side-effects. Therefore:

Add asyncBefore or asyncAfter around non-transactional calls, so the call is retried in isolation.
For more advanced scenarios, consider locking at the business key level (e.g., Redis) to ensure only one transaction modifies a given entity at a time.

External Task Mechanism

External tasks delegate specialized work to another microservice:

The engine persists the task.
A worker fetches and locks it, performs the work, then reports completion.
The process continues upon notification.

Advantages:

Microservices can be in any language.
The process engine is decoupled from the external logic.
If a worker fails mid-task, the lock eventually expires and another instance can take over.

Practical Recommendations

Favor asynchronous boundaries for remote calls to avoid lengthy blocking transactions.
Use compensation or fallback flows for partial failures in multi-step operations.
In advanced concurrency use cases, consider a distributed lock on the business key (e.g., Redis).
Monitor concurrency at parallel gateways and multi-instance tasks to handle optimistic locking exceptions.

Job Executor and Prioritization

The job executor handles timers, asynchronous continuations, and retries. It looks for tasks in the queue and processes them, subject to locking and concurrency controls.

Job Priority

Jobs can have a numeric priority. Higher-priority tasks are picked up first when the executor is under load. For critical tasks, set a higher priority.

Handling Failed Jobs

If a job fails:

The retry counter decreases.
It unlocks so another attempt can be made.
Once retries reach zero, an incident is created, requiring manual resolution.
Default retries may be zero for non-idempotent operations to prevent unintended re-calls.

Business Transactions

In large, distributed workflows spanning multiple microservices, a single ACID transaction is impractical. Instead, the engine orchestrates “business transactions,” often using:

Splitting large operations into multiple steps.
Compensation or error flows to revert partial changes if a later step fails.
SAGA-like patterns: each operation has a local commit and a compensating action.

Example: Order Handling

Reserve inventory in one microservice.
Process payment in another.
Arrange shipment in a third.

If shipment fails, the engine compensates by reversing payment and releasing inventory. Model this with BPMN error or compensation events.

With the right approach to wait states, async continuations, external tasks, job prioritization, concurrency handling, and business transaction logic, the process engine coordinates complex telecom and microservice workflows reliably. Paying attention to non-transactional calls, reprocessing, and concurrency will keep your system robust under heavy load.