Apache fcgid php "Working" idle php processes

269 views Asked by At

We are having an issue where Apache (2.4.10) FCGID (2.3.9) PHP processes are getting stuck in "Working" State on Debian.

These PHP processes occupy no system resources (beyond their previously used memory footprint from processing previous requests) and are idle. They are still attached to the correct logical parent process (the apache2 process handling requests on this vhost)

Connecting strace to them reveal them to be in state: accept(0, We presume listening to receive the next request.

Application logging added at our PHP processing in handle_shutdown function shows that all of these requests have hit the handle_shutdown function (with no error) - as you would expect for any PHP handled request (as you always hit the handle_shutdown function), so to the best of our knowledge the entire request has "succeeded" A 200 response gets logged in the apache access log.

However, the apachectl fullstatus fcgid section shows the process to be "Working" rather than "Ready"

Changing the recycling factors on the Fcgid settings (Max requests, lifetime, timeouts beging set wither higher or lower etc) does not seem to affect the regularity of these occurring.

An apachectl graceful successfully cleans up all the Idle "Working" threads and gets back to normal.

However, of course, if we leave this without watching, eventually, each of the processes ends up in this state sooner or later, until we end up with a completely idle server where all of our max processes (100) are all stuck "Waiting" but idle. At this point memory usage is reasonable, and of course CPU, network et al drops to negligible, as the only request the server will respond to is the fullstatus (because it does not hit the PHP vhost section)

1

There are 1 answers

0
Meh-Lindi On

Well.

It turns out that the first suggested response (set the apache2 Mutex mode to "sem" from file:" was correct, but when it was first applied - our apache2 services were not cold started but restarted so of course the new Mutex mode was not actually used.

Thus when testing still showed the error this suggestion was written off.


What's happening?

The apache2 fcgid PM process keeps three lists of all current children processes: "Ready", "Working" and "Error/Exiting".

It uses a mutex (a lock) to protect these lists whenever it moves a child processes' information block from "Ready" to "Waiting" when it gives it a request to process - and again when the child process finishes the request, in order to move it back to "Ready".

This mutex protects a shared resource that is accessed from multiple threads or processes from being able to be "overrwitten" by one process whilst another process is also trying to read or write a value (causing that other process to have either read an inconsistent value, or for its write to get lost) by only allowing one process at a time to access that vital resource.

The default "file:" mutexes on Debian it seems are not up to the job causing very occasionally (as suggested elsewhere on the 'net), two requests for status changes occurring at the same time and thus one change succeeds whilst the other (concurrent) change gets "lost".

Thus the child knowing it had finished but the parent thinking it hadn't.

Grrr!


Moral: on Debian, change your mutex mode if using apache2 with fcgid - and make sure you do a full apache stop followed by apache start, and don't trust your server admins who have only done an apache restart!