What causes mysterious hanging threads in Colfusion -> mysql communication

2.1k views Asked by At

One of the more interesting "features" in Coldfusion is how it handles external requests. The basic gist of it is that when a query is made to an external source through <cfquery> or or any other external request like that it passes the external request on to a specific driver and at that point CF itself is unable to suspend it. Even if a timeout is specified on the query or in the cfsetting it is flatly ignored for all external requests.

http://www.coldfusionmuse.com/index.cfm/2009/6/9/killing.threads

So with that in mind the issue we've run into is that somehow the communication between our CF server and our mySQL server sometimes goes awry and leaves behind hung threads. They have the following characteristics.

  1. The hung thread shows up in CF and cannot be killed from FusionReactor.
  2. There is no hung thread visible in mySQL, and no active running query (just the usual sleeps).
  3. The database is responding to other calls and appears to be operating correctly.
  4. Max connections have not been reached for the DB nor the user.

It seems to me the only likely candidate is that somehow CF is making a request, mySQL is responding to that request but with an answer which CF ignores and continues to keep the thread open waiting for a response from mySQL. That would explain why the database seems to show no signs of problems, but CF keeps a thread open waiting for the mysterious answer.

Usually these hung threads appear randomly on otherwise working scripts (such as posting a comment on a news article). Even while one thread is hung for that script, other requests for that script will go through, which would imply that the script isn't neccessarily at fault, but rather the condition faced when the script was executed.

We ran some test to determine that it was not a mysql generated max_connections error... we created a user, gave it 1 max connections, tied that connection with a sleep(1000) query and executed another query. Unfortunately, it correctly errored out without generating a hung thread.

So, I'm left at this point with absolutely no clue what is going wrong. Is there some other connection limit or timeout which could be causing the communication between the servers to go awry?

3

There are 3 answers

1
Owen Allen On BEST ANSWER

Long story short, but I believe the caused was due to Coldfusion's CF8 image processing. It was just buggy and now in CF9 I have never seen that problem again.

0
Ben Doom On

We had a similar problem with a MS SQL server. There, the root cause was a known issue in which, for some reason, the server thinks it's shutting down, and the thread hangs (even though the server is, obviously, not shutting down).

We weren't able to eliminate the problem, but were able to reduce it by turning off pooled DB connections and fiddling with the connection refresh rate. (I think I got that label right -- no access to administrator at my new employment.) Both are in the connection properties in Administrator.

Just a note: The problem isn't entirely with CF. The problem, apparently, affects all Java apps. Which does not, in any way, reduce how annoyed I get by this.

0
Daniel Sellers On

One of the things you should start to look at is the hardware between the two servers. It is possible that you have a router or bridge or NIC that is dropping occasional packets. This can result in the mySQL box thinking it has completed the task while the CF server sits there and waits for a complete response indefinitely, creating a hung thread.

3com has some details on testing for packet loss here: http://support.3com.com/infodeli/tools/netmgt/tncsunix/product/091500/c11ploss.htm#22128