Unexpected fault on ReliableSession in NetTcpBinding (WCF)

1k views Asked by At

I have a client server application. My scenario:

  • .Net Framework 4.6.1
  • Quad Core i7 machine with hyperthreading enabled
  • Server CPU load from 20 - 70 %
  • Network load < 5% (GBit NIC)
  • 100 users
  • 30 services (some administrative ones, some generic ones per datatype) running and each user is connected to all services
  • NetTcpBinding (compression enabled)
  • ReliableSession enabled
  • each second I do trigger (server side) an update notification and all clients load from the server approx. 100 kB
  • additionally a heartbeat is running (for testing 15 seconds interval) which simply returns the server time in UTC

Sometimes the WCF connections change to faulted state. Usually when this happens the server has no network upstream at all. I did write a memory dump and was able to see that lots of WCF threads were waiting for some WaitQueue. The call stack is:

Server stack trace: 
   at System.ServiceModel.Channels.TransmissionStrategy.WaitQueueAdder.Wait(TimeSpan timeout)
   at System.ServiceModel.Channels.TransmissionStrategy.InternalAdd(Message message, Boolean isLast, TimeSpan timeout, Object state, MessageAttemptInfo& attemptInfo)
   at System.ServiceModel.Channels.ReliableOutputConnection.InternalAddMessage(Message message, TimeSpan timeout, Object state, Boolean isLast)
   at System.ServiceModel.Channels.ReliableDuplexSessionChannel.OnSend(Message message, TimeSpan timeout)
   at System.ServiceModel.Channels.DuplexChannel.Send(Message message, TimeSpan timeout)
   at System.ServiceModel.Dispatcher.DuplexChannelBinder.Send(Message message, TimeSpan timeout)
   at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)
   at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)
   at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)

I did tweak the settings and it seems that the situation is eased - Now there are faulting less clients. My settings:

  • ReliableSession.InactivityTimeout: 01:30:00
  • ReliableSession.Enabled: True
  • ReliableSession.Ordered: False
  • ReliableSession.FlowControlEnabled: False
  • ReliableSession.MaxTransferWindowSize: 4096
  • ReliableSession.MaxPendingChannels: 16384
  • MaxReceivedMessageSize: 1073741824
  • ReaderQuotas.MaxStringContentLength: 8388608
  • ReaderQuotas.MaxArrayLength: 1073741824

I am stuck. Why do all calls try to wait for some WaitQueue in the TransmissionStrategy? I do not care about messages being sent out of order (I do take care of that myself). I was already thinking about disabling reliable messaging but the application is used in a company network worldwide. I need to know that my messages were delivered.

Any ideas how to teach WCF to just send the messages and do not care about anything else?

EDIT

The values for service throttling are set to Int32.MaxValue.

I did also try to set MaxConnections and ListenBackLog (on NetTcpBinding) to their maximum values. It did not change anything - as far as I can tell.

EDIT 2

Checking the WCF Traces it tells me (German message, therefore a rough translation) that there is no available space in the reliable messaging transfer window - and then all I get are Timeouts because no more messages are sent.

Whats going on there? Is it possible that the reliable messaging confuses itself?

3

There are 3 answers

3
toATwork On BEST ANSWER

Long story short:

It turns out that my WCF settings are just fine.

The ThreadPool is the limiting factor. In high traffic (and therefore high load) situations I do generate to much messages which have to be sent to the clients. Those are queued up as there are not enough worker threads to send the messages. At some point the queue is full - and there you are.

For more details check this question & answer from Russ Bishop.

Interesting detail: This did even decrease the CPU load in high traffic situations. From spiking crazy between 30 and 80 percent to a(n) (almost) steady value around 30 percent. I can only assume that is is because of threadpool thread generation and cleanup.

EDIT

I did the following:

ThreadPool.SetMinThreads(1000, 500)

That values might be like using a sledgehammer to crack a nut - but it works.

2
oshvartz On

The wait queue can be related to wcf built in throttling behavior https://learn.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/wcf/servicethrottling The best way to troubleshoot is to enable wcf tracing https://learn.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/wcf/servicethrottling And know exactly what is the root cause

1
Ackelry Xu On

Do you use connectionManagement to set maxconnection of your client?(If your session is duplex) https://learn.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/network/connectionmanagement-element-network-settings

Your MaxPendingChannels is set to 16384, which will make too many client wait in the queue, if the server couldn't deal with the clients in time, the channel may turn to fault state.

FlowControlEnabled means whether to continue sending message to server side when the server has no space left to save the message. You had better set it to true.

InactivityTimeout means whether to close the session when there is no message exchange within a certain period of time. You had better set it to a suitable value.

In addition , have you set your binding's timeout?

  <netTcpBinding>
    <binding  closeTimeout="" openTimeout="" receiveTimeout="" sendTimeout="" ></binding>
  </netTcpBinding>