what is the difference between Circuit Breaker and Retry in spring boot microservice?

6.1k views Asked by At

One of my colleagues asked me this question what the difference between Circuit Breaker and Retry is but I was not able answer him correctly. All I know circuit breaker is useful if there is heavy request payload, but this can be achieve using retry. Then when to use Circuit Breaker and when to Retry.

Also, it is it possible to use both on same API?

3

There are 3 answers

2
Peter Csala On BEST ANSWER

Several years ago I wrote a resilience catalog to describe different mechanisms. Originally I've created this document for co-workers and then I shared it publicly. Please allow me to quote here the relevant parts.

Retry

Categories: reactive, after the fact

The relation between retries and attempts: n retries means at most n+1 attempts. The +1 is the initial request, if it fails (for whatever reason) then retry logic kicks in. In other words, the 0th step is executed with 0 delay penalty.

There are situation where your requested operation relies on a resource, which might not be reachable in a certain point of time. In other words there can be a temporal issue, which will be gone sooner or later. This sort of issues can cause transient failures. With retries you can overcome these problems by attempting to redo the same operation in a specific moment in the future. To be able to use this mechanism the following criteria group should be met:

  • The potentially introduced observable impact is acceptable
  • The operation can be redone without any irreversible side effect
  • The introduced complexity is negligible compared to the promised reliability

Let’s review them one by one:

  • The word failure indicates that the effect is observable by the requester as well, for example via higher latency / reduced throughput / etc.. If the “penalty“ (delay or reduced performance) is unacceptable then retry is not an option for you.
  • This requirement is also known as idempotent operation. If I call the action with the same input several times then it will produce the exact same result. In other words, the operation acts like it only depends on its parameter and nothing else influences the result (like other objects' state).
  • This condition is even though one of the most crucial, this is the one that is almost always forgotten. As always there are trade-offs (If I introduce Z then it will increase X but it might decrease Y).
    • We should be fully aware of them otherwise it will give us some unwanted surprises in the least expected time.

Circuit Breaker

Categories: proactive, before the fact

It is hard to categorize the circuit breaker because it is pro- and reactive at the same time. It detects that a given downstream system is malfunctioning (reactive) and it protects the downstream systems from being flooded with new requests (proactive).

This is one of the most complex patterns mainly because it uses different states to define different behaviours. Before we jump into the details lets see why this tool exists at all:

Circuit breaker detects failures and prevents the application from trying to perform the action that is doomed to fail (until it is safe to retry) - Wikipedia

So, this tool works as a mini data and control plane. The requests go through this proxy, which examines the responses (if any) and it counts subsequent failures. If a predefined threshold is reached then the transfer is suspended temporarily and it fails immediately.

  • Why is it useful?

It prevents cascading failures. In other words the transient failure of a downstream system should not be propagated to the upstream systems. By concealing the failure we are actually preventing a chain reaction (domino effect) as well.

  • How does it know when a transient failure is gone?

It must somehow determine when would be safe to operate again as a proxy. For example it can use the same detection mechanism that was used during the original failure detection. So, it works like this: after a given period of time it allows a single request to go through and it examines the response. If it succeeds then the downstream is treated as healthy. Otherwise nothing changes (no request is transferred through this proxy) only the timer is reset.

  • What states does it use?

The circuit breaker can be in any of the following states: Closed, Open, HalfOpen.

  • Closed: It allows any request. It counts successive failed requests.
    • If the successive failed count is below the threshold and the next request succeeds then the counter is set back to 0.
    • If the predefined threshold is reached then it transitions into Open
  • Open: It rejects any request immediately. It waits a predefined amount of time.
    • If that time is elapsed then it transitions into HalfOpen
  • HalfOpen: It allows only one request. It examines the response of that request:
    • If the response indicates success then it transitions into Closed
    • If the response indicates failure then it transitions back to Open

Resiliency strategy

The above two mechanisms / policies are not mutually exclusive, on the contrary. They can be combined via the escalation mechanism. If the inner policy can't handle the problem it can propagate one level up to an outer policy.

When you try to perform a request while the Circuit Breaker is Open then it will throw an exception. Your retry policy could trigger for that and adjust its sleep duration (to avoid unnecessary attempts).

The downstream system can also inform upstream that it is receiving too many requests with 429 status code. The Circuit Breaker could also trigger for this and use the Retry-After header's value for its sleep duration.

So, the whole point of this section is that you can define a protocol between client and server how to overcome on transient failures together.

0
anandchaugule On

Here's a brief overview of each mechanism:

Circuit Breaker:

  • The Circuit Breaker pattern is designed to prevent an application from repeatedly trying to execute an operation that is likely to fail.
  • It helps in protecting the system from cascading failures by temporarily stopping requests to a failing service, allowing it to recover.
  • When the circuit is open, requests are either rejected immediately or directed to a fallback method.

Retry:

  • The Retry pattern is used to automatically re-invoke an operation
    that has previously failed, with the hope that the failure was
    temporary.
  • It is useful for handling transient failures, such as network issues, that might be resolved by retrying the operation.
  • You can configure the number of retries, backoff strategies, and
    other retry-related parameters.

Which Approach You Can Choose Between Them ?

  • Use circuit breaker if you want to protect your system from repeated failures and avoid overwhelming a failing service.
  • Use retry if you want to increase the chances of success for transient failures and recover from them without resorting to a circuit breaker.
  • In some scenarios, using both mechanisms might be appropriate, especially when dealing with a complex distributed system.
4
Abhishek Mahajan On

The Retry pattern enables an application to retry an operation in hopes of success.

The Circuit Breaker pattern prevents an application from performing an operation that is likely to fail.

Retry - Retry pattern is useful in scenarios of transient failures. What does this mean? Failures that are "temporary", lasting only for a short amount of time are transient. A momentary loss of network connectivity, a brief moment when the service goes down or is unresponsive and related timeouts are examples of transient failures.

As the failure is transient, retrying after some time could possibly give us the result needed

Circuit Breaker - Circuit Breaker pattern is useful in scenarios of long lasting faults. Consider a loss of connectivity or the failure of a service that takes some time to repair itself. In such cases, it may not be of much use to keep retrying often if it is indeed going to take a while to hear back from the server. The Circuit Breaker pattern wants to prevent an application from performing an operation that is likely to fail.

The Circuit Breaker keeps a tab on the number of recent failures, and on the basis of a pre-determined threshold, determines whether the request should be sent to the server under stress or not.