I have a very odd problem in a proxy cluster of four Squid proxies:
One of the machine is the master. The mater is running ldirectord which is checking the availability of all four machines, distributing new client connections.
All over a sudden, after years of operation I'm encountering this problem:
1) The machine serving the master role is not being assigned new connections, old connections are served until a new proxy is assigned to the clients.
2) The other machines are still processing requests, taking over the clients from the master (so far, so good)
3) "ipvsadm -L -n" shows ever-decreasing ActiveConn and InActConn values.
Once I migrate the master role to another machine, "ipvsadm -L -n" is showing lots of active and inactive connections, until after about an hour the same thing happens on the new master.
Datapoint: This happened again this afternoon, and now "ipvsadm -L -n" shows:
TCP 141.42.1.215:8080 wlc persistent 1800
-> 141.42.1.216:8080 Route 1 98 0
-> 141.42.1.217:8080 Route 1 135 0
-> 141.42.1.218:8080 Route 1 1 0
-> 141.42.1.219:8080 Route 1 2 0
No change in the numbers quite some time now.
Some more stats (ipvsadm -L --stats -n):
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Conns InPkts OutPkts InBytes OutBytes
-> RemoteAddress:Port
TCP 141.42.1.215:8080 1990351 87945600 0 13781M 0
-> 141.42.1.216:8080 561980 21850870 0 2828M 0
-> 141.42.1.217:8080 467499 23407969 0 3960M 0
-> 141.42.1.218:8080 439794 19364749 0 2659M 0
-> 141.42.1.219:8080 521378 23340673 0 4335M 0
Value for "Conns" is constant now for all realservers and the virtual server now. Traffic is still flowing (InPkts increasing).
I examined the output of "ipvsadm -L -n -c" and found:
25 FIN_WAIT
534 NONE
977 ESTABLISHED
Then I waited a minute and got:
21 FIN_WAIT
515 NONE
939 ESTABLISHED
It turns out that a local bird installation was injecting router for the IP of the virtual server and thus taking precedence over ARP.