How to accept connection loss and SQL client exceptions in SQL Azure and still live happily ever after

Dealing with a failover cluster like a database hosted in SQL Azure implies unfortunately to deal with connection issues, just because one is not connected to a single instance but to a bunch of typically three SQL server instances. Thus connection requests are routed by a load balancer to one or the other instances.

From time to time (quite often to be honest) instances in SQL Azure are switched off and replaced by new instances for maintenance reasons. This is when your client might receive connection timeout exceptions. Here are a few of the exceptions I received:

  • The client was unable to establish a connection because of an error during connection initialization process before login. Possible causes include the following: the client tried to connect to an unsupported version of SQL Server; the server was too busy to accept new connections; or there was a resource limitation (insufficient memory or maximum allowed connections) on the server. (provider: TCP Provider, error: 0 – An existing connection was forcibly closed by the remote host.)
  • A transport-level error has occurred when sending the request to the server. (provider: TCP Provider, error: 0 – An existing connection was forcibly closed by the remote host.)
  • The service is currently busy. Retry the request after 10 seconds.
  • Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
    The cause of those exceptions, an instance that is not available anymore for instance, is actually pretty reasonable and in fact it is something that one need to have in mind when interacting with a cluster instead of a single server.

But how to cope with it?

Transient Fault Handling

Because I wasn’t the first who was facing this problem I found a pretty good solution based on the Microsoft patterns and practices guide, called Transient Fault Handling.

The basic idea of this solution is that there are differences in the exceptions you receive when talking to SQL Azure. Some are transient and though valid to give it another try and some simply aren’t. If you are facing a connection issue you want to try it again for a few times, a few seconds later. A syntax error, somewhere in the SQL query, on the other hand wouldn’t make sense to get tried again.

Nuget packages

All you need is to install the Transient Fault Handling Core package, if you need only Linq support, or for the Enterprise Library 5.0 Transient Fault Handling Application Block package. Just search for TransientFault in the nuget package manager console.

ItransientErrorDetectionStrategy

After the reference(s) are added you can implement the ItransientErrorDetectionStrategy interface. Which provides exact one method, named IsTransient.

There are different sample implementations on the web, some deal with SqlExcpetion error numbers, some evaluate the exception message text. I don’t prefer parsing the exception text, it seems odd to me, because those messages might be localized.

An implementation might look like this.

Public Class MyRetryStrategy
    Implements ITransientErrorDetectionStrategy

    Public Function IsTransient(ex As Exception) As Boolean Implements ITransientErrorDetectionStrategy.IsTransient
        If Not ex Is Nothing AndAlso TypeOf ex Is SqlException Then
            Dim sqEx As SqlException = CType(ex, SqlException)
            If Not sqEx Is Nothing Then
                Select Case sqEx.Number
                    Case 40197, 40501, 10053, 10054, 53, 10060, 40613, 40143, 233, 64, -2
                        Return True
                End Select
            End If
        ElseIf TypeOf ex Is TimeoutException Then
            Return True
        End If
        Return False
    End Function

End Class

A much more thoroughly implemented version can be found here.

Dealing with numbers on the other hand is tricky too, because native SQL server errors are wrapped by the .Net framework and can be provided by different sources.

I decided to check the most common error numbers, but log every exception that is checked by the IsTransient method. If the error number was not in my transient exception list, I can still add it later.

Retry aware execution

The execution of a retry aware linq statement is quite easy actually. First you need to define a RetryPolicy based on your RetryStrategy. You’d also add the amount of retries and a timespan when to start the next retry in the constructor.

Dim myRetryPolicy = New RetryPolicy(Of MyRetryStrategy)(5, TimeSpan.FromSeconds(3))

Now, that we have created a policy we can take the actual linq query and wrap it into a retry policy Action. The policy ExcecuteAction method executes consumes the linq statement as a lambda function and executes it, but now retry aware, in other words it wraps an extra try catch block around it.

e.Result = myRetryPolicy.ExecuteAction(
        Function()
            Using db = DBHelper.GenerateDataContext()
                Dim src = (From a In db.tblAccounts)
                If src Is Nothing Then
                    Throw New ArgumentException("src")
                End If
                Return src.ToList
            End Using
        End Function)

An exception, thrown inside the action will be caught by the retry policy handler and forwarded to the retry detection strategy implementation. If the exception is transient, the action will be executed again until the execution was successful. If the exception wasn’t considered as transient or the retry limit was hit, the original exception is finally thrown.

Conclusion

This enterprise application block of Microsoft’s Best Practices team is a pretty straight forward solution, easy to implement and exactly what I was looking for to work reliably with SQL Azure databases.

Advertisements