We have a shibboleth native SP 2.5.4 that's been running for a few years without any issues. Yesterday I had to update a certificate for one of the IDP. Since that restart I've been getting intermittent errors:
Cannot connect to shibd process, a site adminstrator should be notified.
Errors appear to occur in bursts as shown by these number of errors per minute:
nb | time
58 Sep 22 09:56
82 Sep 22 10:53
82 Sep 22 11:16
80 Sep 22 11:17
89 Sep 22 11:37
71 Sep 22 11:38
130 Sep 22 11:43
47 Sep 22 11:44
Restarting httpd and shibd didn't resolve the issue. SElinux is disabled.
In /var/log/shibboleth-www/native_warn.log I have:
2020-09-22 11:54:13 ERROR Shibboleth.Listener [15798] shib_check_user: socket call (connect) resulted in error (2): no message
2020-09-22 11:54:13 WARN Shibboleth.Listener [15798] shib_check_user: cannot connect socket (21)...
2020-09-22 11:54:13 CRIT Shibboleth.Listener [15798] shib_check_user: socket server unavailable, failing
2020-09-22 11:54:13 ERROR Shibboleth.Apache [15798] shib_check_user: Cannot connect to shibd process, a site adminstrator should be notified.
Memory and CPU look good to me:
top - 12:08:08 up 25 days, 22:33, 2 users, load average: 1.01, 1.03, 1.01
Tasks: 294 total, 1 running, 293 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2%us, 0.1%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32880188k total, 4712256k used, 28167932k free, 426772k buffers
Swap: 5242876k total, 0k used, 5242876k free, 1993996k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16876 apache 20 0 370m 10m 4160 S 0.7 0.0 0:00.20 httpd
17418 shibd 20 0 4894m 58m 8084 S 0.7 0.2 0:01.46 shibd
2401 root 20 0 3116m 270m 19m S 0.3 0.8 128:41.88 cylancesvc
17519 apache 20 0 370m 10m 3948 S 0.3 0.0 0:00.12 httpd
17766 apache 20 0 370m 10m 3872 S 0.3 0.0 0:00.13 httpd
Any idea what could cause this?