I am trying to have a server with which multiple clients need to open a websocket and send data. But it looks like many clients are not able to make a connection..
On the server machine, when I do an lsof
or netstat -an
, I see that a lot of connections are shown in state FIN_WAIT1
and FIN_WAIT2
apart from the connections being in ESTABLISHED
state. The ulimit for open files is 1024 as of now.
Would the connections which are stuck in these 2 states get counted in the list of open files? If that's the case, 1024 limit will get exhausted very soon.
/proc/sys/net/ipv4/tcp_orphan_retries
is 0
, which is equivalent to 8
it seems
https://serverfault.com/questions/274212/what-does-tcp-orphan-retries-set-to-0-mean/408882#408882
I have consulted this link: https://serverfault.com/questions/7689/how-do-i-get-rid-of-sockets-in-fin-wait1-state
But I don't understand much. I have read about these 2 states on the web, and I realize that they are a aprt of the protocol, but I'd prefer that connections don't get stuck in the states in which they are not being useful. Can I do that somehow? Should I change the ulimit? But that would just mean that the problem will occur at time x+y instead of x.
Any time you see a Fin_Wait state or any wait state for that matter, we often refer to these as 1/2 sessions. The TCP stack follows a very strict protocol on the order of requests and responses. It is because of these rules that it knows how and when as well as how-hard to attempt to recover by sending retries. In the instance of any Wait state the stack knows it's waiting for something. There's only two things that will satisfy this condition 1) Some kind of proper response or 2) A time out.
Of course the best way to go is to receive the proper response. Work should be done to find out why there are so many waits. Sometimes it's due to unstable switching, routing and or other network related activity. However, it could also be a result of Denial of Service Attacking because they don't care about State. The only way that necessary resources at the application layer can be released is when the application regains control. TCP only gives control when 1) The work flow is normal or 2) A time out or other abnormal condition has happened. For example FINs and RSTs can be sent out of sequence and at any time. They are both considered to trump any other state. Keep in mind that not all clients or hosts act the same way as we are talking about different TCP Stack implementation.
Depending on the system, some, many or very few of the TCP Stack parameters can be configured. There are configurable parameters for Timeout values on Fin Waits as well as RST Waits. Perhaps you can adjust these to solve your issue.