Error Policies
One of the challenging aspects of distributed services is that they can fail for various of reasons. Some of these reasons may be due to incorrect use, but oftentimes they fail for reasons that are outside of our control, such as network latency, hardware failure and so on. Error Policies is how we configure Unmeshed to deal with such error scenarios. As a Process Author, you may directly associate an Error Policy with a Step using the Step Definition screen.
There are 4 error scenarios defined by each Error Policy. While every Policy responds to different events, they all share similar properties:
- Run Process. An optional Process to run when an error/timeout occurs.
- Action. Governs how Unmeshed behaves after an error/timeout occurs after the optional Process was run. If you wish for Unmeshed to fail the associated Steps as a result of this error, use FAIL. If you wish to allow additional timeouts, use RETRY and configure the relevant retry parameters. Otherwise, pick NONE for Unmeshed to not do anything.
For an explanation of when every Error is triggered, please continue reading below!
Video Overview - Error Policies
Start Timeout
Triggers if a Step fails to start its work after a certain time window. This scenario might happen, for example, when a Standard Worker is assigned work but isn't able to pick it up within the required time window.
Read details of how the start timeout is configured here Start Timeout Configuration
Completion Timeout
Triggers if a Step fails to complete its work successfully by the required time, starting from its scheduled time. This Timeout allows you to verify that work is completed successfully within a specified time frame, regardless of when the work actually started.
Read details of how the completion timeout is configured here Completion Timeout Configuration
Response Timeout
Triggers if a Step fails to report status (whether successful or not) within the required time window. This countdown for this timeout starts from the moment the work is picked up.
Read details of how the response timeout is configured here Response Timeout Configuration
Fail Config
Triggers when a Step is reported as Failed. Like with other policies, Process Authors may use this policy to configure retry mechanisms and/or a Process to run as a result of failure.
Read details of how the fail is configured here Fail Configuration
Sample Error Policy
{
"responseTimeoutConfig": null,
"completionTimeoutConfig": {
"duration": 300000,
"action": "RETRY",
"runRequest": null
},
"startTimeoutConfig": null,
"failConfig": {
"action": "RETRY",
"runRequest": {
"name": "send_email_notification",
"namespace": "default",
"version": null,
"id": null,
"correlationId": "error_policy_retry_email",
"input": {},
"testRun": false
},
"runRequestAfterRetries": true,
"retryInterval": 5000,
"retryBackoffExponential": false,
"retryCount": 2
}
}