Unexpected fault on ReliableSession in NetTcpBinding (WCF)

Question

Unexpected fault on ReliableSession in NetTcpBinding (WCF)

1k views Asked by toATwork At 11 January 2019 at 08:04

I have a client server application. My scenario:

.Net Framework 4.6.1
Quad Core i7 machine with hyperthreading enabled
Server CPU load from 20 - 70 %
Network load < 5% (GBit NIC)
100 users
30 services (some administrative ones, some generic ones per datatype) running and each user is connected to all services
NetTcpBinding (compression enabled)
ReliableSession enabled
each second I do trigger (server side) an update notification and all clients load from the server approx. 100 kB
additionally a heartbeat is running (for testing 15 seconds interval) which simply returns the server time in UTC

Sometimes the WCF connections change to faulted state. Usually when this happens the server has no network upstream at all. I did write a memory dump and was able to see that lots of WCF threads were waiting for some WaitQueue. The call stack is:

Server stack trace: 
   at System.ServiceModel.Channels.TransmissionStrategy.WaitQueueAdder.Wait(TimeSpan timeout)
   at System.ServiceModel.Channels.TransmissionStrategy.InternalAdd(Message message, Boolean isLast, TimeSpan timeout, Object state, MessageAttemptInfo& attemptInfo)
   at System.ServiceModel.Channels.ReliableOutputConnection.InternalAddMessage(Message message, TimeSpan timeout, Object state, Boolean isLast)
   at System.ServiceModel.Channels.ReliableDuplexSessionChannel.OnSend(Message message, TimeSpan timeout)
   at System.ServiceModel.Channels.DuplexChannel.Send(Message message, TimeSpan timeout)
   at System.ServiceModel.Dispatcher.DuplexChannelBinder.Send(Message message, TimeSpan timeout)
   at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)
   at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)
   at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)

I did tweak the settings and it seems that the situation is eased - Now there are faulting less clients. My settings:

ReliableSession.InactivityTimeout: 01:30:00
ReliableSession.Enabled: True
ReliableSession.Ordered: False
ReliableSession.FlowControlEnabled: False
ReliableSession.MaxTransferWindowSize: 4096
ReliableSession.MaxPendingChannels: 16384
MaxReceivedMessageSize: 1073741824
ReaderQuotas.MaxStringContentLength: 8388608
ReaderQuotas.MaxArrayLength: 1073741824

I am stuck. Why do all calls try to wait for some WaitQueue in the TransmissionStrategy? I do not care about messages being sent out of order (I do take care of that myself). I was already thinking about disabling reliable messaging but the application is used in a company network worldwide. I need to know that my messages were delivered.

Any ideas how to teach WCF to just send the messages and do not care about anything else?

EDIT

The values for service throttling are set to Int32.MaxValue.

I did also try to set MaxConnections and ListenBackLog (on NetTcpBinding) to their maximum values. It did not change anything - as far as I can tell.

EDIT 2

Checking the WCF Traces it tells me (German message, therefore a rough translation) that there is no available space in the reliable messaging transfer window - and then all I get are Timeouts because no more messages are sent.

Whats going on there? Is it possible that the reliable messaging confuses itself?

Original Q&A

There are 3 answers

oshvartz On 11 January 2019 at 15:51

The wait queue can be related to wcf built in throttling behavior https://learn.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/wcf/servicethrottling The best way to troubleshoot is to enable wcf tracing https://learn.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/wcf/servicethrottling And know exactly what is the root cause

Ackelry Xu On 14 January 2019 at 06:43

Do you use connectionManagement to set maxconnection of your client?(If your session is duplex) https://learn.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/network/connectionmanagement-element-network-settings

Your MaxPendingChannels is set to 16384, which will make too many client wait in the queue, if the server couldn't deal with the clients in time, the channel may turn to fault state.

FlowControlEnabled means whether to continue sending message to server side when the server has no space left to save the message. You had better set it to true.

InactivityTimeout means whether to close the session when there is no message exchange within a certain period of time. You had better set it to a suitable value.

In addition , have you set your binding's timeout?

  <netTcpBinding>
    <binding  closeTimeout="" openTimeout="" receiveTimeout="" sendTimeout="" ></binding>
  </netTcpBinding>

**toATwork** · Accepted Answer · 2019-01-17T12:17:07+00:00

Long story short:

It turns out that my WCF settings are just fine.

The ThreadPool is the limiting factor. In high traffic (and therefore high load) situations I do generate to much messages which have to be sent to the clients. Those are queued up as there are not enough worker threads to send the messages. At some point the queue is full - and there you are.

For more details check this question & answer from Russ Bishop.

Interesting detail: This did even decrease the CPU load in high traffic situations. From spiking crazy between 30 and 80 percent to a(n) (almost) steady value around 30 percent. I can only assume that is is because of threadpool thread generation and cleanup.

EDIT

I did the following:

ThreadPool.SetMinThreads(1000, 500)

That values might be like using a sledgehammer to crack a nut - but it works.

TechQA.

Unexpected fault on ReliableSession in NetTcpBinding (WCF)

There are 3 answers

Related Questions in C#

Related Questions in WCF

Related Questions in FAULT

Related Questions in RELIABLESESSION

Popular Questions

Popular Tags

Trending Questions