Polly Circuit Breaker / Retry to automatically restart queries after a network outage

2.3k views Asked by At

I am implementing the Circuit Breaker and Retry patterns via Polly in .NET framework 4.5.2.

I wish to see if my understanding is correct.

Question 1: If there is a network outage and the circuit breaker has reached the exceptionsAllowedBeforeBreaking number, gone into the open state and waited for the durationOfBreak period, the circuit will be open for new requests but those that have been sent will throw an exception?

Question 2: If the desired behavior is for those requests that had exceptions to be retried instead of the circuit breaker throwing an exception then the Retry policy needs to be implemented in addition to the Circuit Breaker policy. My understanding of this is that the behavior in question 1 would occur, and then the retry would be attempted.

A. If there is a network outage or the service is down and the desired behavior is for a request to be retried as soon as the network is restored or the service is up again, a RetryForever would need to be performed. Is there a better way of doing this? Effectively there would be lots of blocking, correct?

In terms of code, my policies are currently defined as:

    const int maxRetryAttempts = 3;

    const int exceptionsAllowedBeforeBreaking = 2;
    const int pauseBetweenFailures = 2;
    readonly Policy retryPolicy = Policy
        .Handle<Exception>()
        .RetryAsync(maxRetryAttempts, (exception, retryCount) => System.Diagnostics.Debug.WriteLine($"Retry {retryCount}"));

    readonly Policy circuitBreakerPolicy = Policy
        .Handle<Exception>()
        .CircuitBreakerAsync(exceptionsAllowedBeforeBreaking: exceptionsAllowedBeforeBreaking,
                durationOfBreak: TimeSpan.FromSeconds(pauseBetweenFailures),
                onBreak: (e, span) => System.Diagnostics.Debug.WriteLine("Breaking circuit for " + span.TotalMilliseconds + "ms due to " + e.Message),
                onReset: () => System.Diagnostics.Debug.WriteLine("Trial call succeeded: circuit closing again."),
                onHalfOpen: () => System.Diagnostics.Debug.WriteLine("Circuit break time elapsed.  Circuit now half open: permitting a trial call."));

My calling code is done as:

var response = await retryPolicy.WrapAsync(circuitBreakerPolicy).ExecuteAsync(() => this.client.SendAsync<TData, JObject>(message, cancel, jsonSerializer));

I have observed that if I disconnect the network past the amount of time necessary to run all retries on the circuit breaker, the CancellationToken is set to cancel and all requests fail at that point. If the network is restored before that happens, then the requests are retried.

2

There are 2 answers

0
mountain traveller On BEST ANSWER

Question 1: If there is a network outage and the circuit breaker has reached the exceptionsAllowedBeforeBreaking number, gone into the open state and waited for the durationOfBreak period, the circuit will be open for new requests ...

After the durationOfBreak has passed the circuit will transition to Half-Open state, during which a single trial call is permitted (in the current implementation).

... but those that have been sent will throw an exception?

Calls which were placed during the Open state will throw BrokenCircuitException.

Question 2: If the desired behavior is for those requests that had exceptions to be retried instead of the circuit breaker throwing an exception then the Retry policy needs to be implemented in addition to the Circuit Breaker policy. My understanding of this is that the behavior in question 1 would occur, and then the retry would be attempted.

Correct. The circuit-breaker will still throw that BrokenCircuitException (there is no 'instead' that stops the circuit-breaker doing that). However, if a wrapping retry policy handles that exception, then the BrokenCircuitException will not be propagated back to calling code. Runnable examples can be found in Polly-Samples or this dotnetfiddle.

A. If there is a network outage or the service is down and the desired behavior is for a request to be retried as soon as the network is restored or the service is up again, a RetryForever would need to be performed. Effectively there would be lots of blocking, correct?

A Polly policy governs only what happens on that execution path, unaware of whether there are similar parallel executions. So yes, if there is a RetryForever and if you expect high numbers of calls to loop in that RetryForever while connectivity is lost, there is a risk of memory/resource bulge with many operations in the holding pattern. To know whether that was a significant concern for your application/architecture you would need to trial in the representative environment.

Is there a better way of doing this?

You can limit the number of retries and capture failed sends into some kind of queue. When connectivity is restored you can re-send the items from the failure queue.

0
Rudy Hinojosa On

I'm not a big fan of the RetryForever policy. What if you have a scenario where you are inserting 80,000 records into a table via an INSERT statement. Suddenly a network issue occurs. You will in no time at all have 80k async processes hog tying your system resources.

I highly recommend a PolicyWrap using Retry, BulkHead and CircuitBreaker. I wrote a custom SQLExtension class that has a secondary fallback connectionstring and fallback command to write critical transactions to some secondary sql server in the event of circuitbreaker exhaustion.

Something else to keep in mind. You can track for specific SQL Exception errors, mainly Duplicates. Let's say your sql command isn't an simple Insert command but instead a complex stored procedure. The circuitbreaker, once relieved, would simply re-run the stored procedure all over again and produce Duplicate found messages on a table with a primary key. So you may want to ignore duplicate messages.