Websocket connection stuck in FIN_WAIT1 FIN_WAIT2 states

3.4k views Asked by At

I am trying to have a server with which multiple clients need to open a websocket and send data. But it looks like many clients are not able to make a connection..

On the server machine, when I do an lsof or netstat -an, I see that a lot of connections are shown in state FIN_WAIT1 and FIN_WAIT2 apart from the connections being in ESTABLISHED state. The ulimit for open files is 1024 as of now. Would the connections which are stuck in these 2 states get counted in the list of open files? If that's the case, 1024 limit will get exhausted very soon.

/proc/sys/net/ipv4/tcp_orphan_retries is 0, which is equivalent to 8 it seems https://serverfault.com/questions/274212/what-does-tcp-orphan-retries-set-to-0-mean/408882#408882

I have consulted this link: https://serverfault.com/questions/7689/how-do-i-get-rid-of-sockets-in-fin-wait1-state

But I don't understand much. I have read about these 2 states on the web, and I realize that they are a aprt of the protocol, but I'd prefer that connections don't get stuck in the states in which they are not being useful. Can I do that somehow? Should I change the ulimit? But that would just mean that the problem will occur at time x+y instead of x.

1

There are 1 answers

0
JWP On

Any time you see a Fin_Wait state or any wait state for that matter, we often refer to these as 1/2 sessions. The TCP stack follows a very strict protocol on the order of requests and responses. It is because of these rules that it knows how and when as well as how-hard to attempt to recover by sending retries. In the instance of any Wait state the stack knows it's waiting for something. There's only two things that will satisfy this condition 1) Some kind of proper response or 2) A time out.

Of course the best way to go is to receive the proper response. Work should be done to find out why there are so many waits. Sometimes it's due to unstable switching, routing and or other network related activity. However, it could also be a result of Denial of Service Attacking because they don't care about State. The only way that necessary resources at the application layer can be released is when the application regains control. TCP only gives control when 1) The work flow is normal or 2) A time out or other abnormal condition has happened. For example FINs and RSTs can be sent out of sequence and at any time. They are both considered to trump any other state. Keep in mind that not all clients or hosts act the same way as we are talking about different TCP Stack implementation.

Depending on the system, some, many or very few of the TCP Stack parameters can be configured. There are configurable parameters for Timeout values on Fin Waits as well as RST Waits. Perhaps you can adjust these to solve your issue.