ColdFusion memory leak when calling cfc directly with no method argument

2.4k views Asked by At

We've had a memory problem for a long time. I've finally tracked down how to replicate the issue but I do not know what is causing it or how to fix it.

We have a number of cfc's in a web accessible /controller directory that handle for submits and processing. When a cfc is called directly with no method argument the server begins to chew up memory.

For instance an URL like http://www.domain.com/controller/LoginController.cfc will run in the browser until it times out. The /CFIDE has been locked down and is not publically accessible

so the cfexplorer is not (or should not) be available.

We use FusionReactor to monitor our instances. Our servers are set for 20GB of heap space. On a fresh restart after loading the application, memory will cruise around 800MB.

With normal traffic, memory will fluctuate between 5GB and 10GB with regular garbage collection. After awhile, the server eventually reaches 98% capacity. It tends to run there

fine for hours or even days sometimes until some spike in traffic pushes it over and an outofmemory error occurs. Garbage collection recovers no memory and there are no active

long running threads reported by FusionReactor. Only a server restart will recovery it.

Using FusionReactor (which we've just installed which is how I finally got some insight into this issue) I was inspecting the PermGen memory space and found that it accounted for

85% of the heap. This didn't seem right at all. I performed a memory dump and loaded it into MAP through Eclipse to analyze it. I found that there were 10 objects in memory

measuring 1.7GB (1.7x10 is approx 85% of total heap). These objects look like this:

Class Name |  Shallow Heap | Retained Heap | Percentage
byte[1769628928] @ 0x4d963b198  ...128.................POST......../controller/LoginController.cfc......../controller/LoginController.cfc........173.14.93.66........173.14.93.66........www.domain.com........443........HTTP/1.1.......;D:\websites\domain\system\controller\Lo...| 1,769,628,944 | 1,769,628,944 |   8.60%

So I restarted CF on one of our servers. Checked FusionReactor and saw no memory usage. Then went to a browser and called the cfc first like this:

http://www.domain.com/controller/LoginController.cfc?method=foo

This resulted in the onMissingMethod handler properly kicking and redirecting to the appropriate error page with no server effect.

But then calling this:

http://www.domain.com/controller/LoginController.cfc

Resulting in a page hang. FusionReactor reports there are no active request even though one is running one which is why we couldn't identify the problem while it was happening. Worse, refreshing the memory sees it slowly increase by tenths of a percentage with no reported activity. The timeout on the server is set to 5 minutes. I'm assuming that eventually it gets killed and then orphaned at 1.7GB. This didn't bring down the server, just spiked the memory where it was now running at a flat 3GB usage where garbage collection recovers nothing. This seems to explain why over time, random calls to these URLs slowly chew up and hold onto memory.

Next I called the URL from multiple browser tabs. This spiked the memory almost instantaneously to 98%. FusionReactor now showed two long running requests 10 seconds and climbing even though there were over 15 browser tabs running. Force killing the thread seemed to do nothing. Only a server restart solved the problem.

So now I've identified the issue specifically (phantom threads creating huge orphaned objects in PermGen heap) and how to replicate the issue.

How or why requests are made directly to the cfc I have no idea. Possibly bots or occasional weird browser behavior.

All the huge objects are instances of jrun.servlet.jrpp.ProxyEndpoint.

What specifically is causing this issue and how do I fix it.

This is CF9.01 Standard on Win2003 Server running Java 1.7.0_25.

Thanks!

Screenshot of MAP analysis of heap dump

3

There are 3 answers

0
Joe Rinehart On

I know it'd represent a big shift in how you do things, but I've always avoided allowing CF to unnecessarily create CFCs. Unless they've changed how they do things (I last played with this versions ago), hitting the CFC directly causes a new instance to be created.

If you're up for a small test, maybe try setting up a simple front controller/delegate .cfm page and moving the CFCs within 'controller' to the application scope. There's certainly more elegant architectures to handle it (short of moving to a full-bore framework), but you could:

Use Application.cfc to set an instance of something (like LoginController) into the application scope and then use a simple "invoke.cfm" page that basically expects the name of one of these application-scoped CFCs to invoke along with parameters. Something like (just for example's sake):

<cfsilent>
<cfset ctlName = url.controllerName />
<cfset methodName = url.methodName />
<cfset response = "" />

<!--- Look up the desired single-cfc controller --->
<cfif len(methodName) and structKeyExists( application.controllers, ctlName ) >
  <cfset ctl = application.controllers.ctlName />

  <!--- Now ask it do to something - note that i'm not validating the method... --->
  <cfinvoke component="#ctl#" method="#methodName#" argumentCollection="#form#" returnVariable="response" />
</cfif>
</cfsilent><cfoutput>#response#</cfoutput>  

Note that this'd cause your 'controllers' to be stateful and thread-safety would need to be considered (but should already be, anyhow).

1
Adrian J. Moreno On

Perhaps you could use onCFCRequest in your Application.cfc to monitor this issue.

It would still create the object, but you could log the request, then the CFABORT should stop the request dead in its tracks.

<cffunction name="oncfcRequest" returnType="void"> 
    <cfargument type="string" name="cfcname"> 
    <cfargument type="string" name="method"> 
    <cfargument type="struct" name="args"> 
    <cfif arguments.method IS "">
        <cflog .... />
        <cfabort />
    </cfif>
</cffunction>
2
crazy4mustang On

I believe this is a legit bug in ColdFusion and I've reported it through their bug system. The problem is partially repeatable on other systems. For instance on my MBP running CF on Apache a direct CFC call does not cause a memory issue but does results in an immediate JRun 'Internal Server Error' page. So something wrong is going on and the systems are handling the problem differently. Anway...

I've found a workaround thanks to @iKnowKungFoo and lots of experimentation.

Inserting a 'method' key/value into the URL scope seems to solve the problem. The caveat is it has to be done in the onRequestStart method and not in the onCFCRequest method. From the docs is seems that a call to a CFC would go directly to the onCFCRequest but this does not seem to be the case. All requests go through the onRequestStart method first. When onRequestStart returns only then is onCFCRequest called AND only if the required 'method' argument exists.

So in this case, onCFCRequest was never being called anyway because the 'method' argument never existed. So here is the code that runs in onRequestStart immediately:

<cfif Right(arguments.targetPage,4) IS ".cfc"
      AND NOT StructKeyExists(URL,"WSDL")
      AND NOT StructKeyExists(URL,"method")
      AND NOT StructKeyExists(FORM,"method")>
    <cfset StructInsert(FORM,"method","")>
    <cfset StructInsert(URL,"method","")>
</cfif>

This bit of code checks the extension on the requested page and if a method argument does not exist in the both the URL and FORM scopes it inserts a blank key/value pair into both for good measure. The check for the 'WSDL' argument is there as I found that while this code worked perfectly, suddenly the few webservices cfc calls we have broke. If the call to the cfc is WebService.cfc?WSDL then the method argument is not required and CF handles the whole thing differently.

So inserting the empty 'method' value then causes onCFCRequest to be called properly on the completion of onRequsetStart. When the cfc is invoked with the invalid empty method name the onMissingMethod is now properly kicked off. That method promptly handles the bad page request and redirects to a custom error page.

Since implementing this fix we've seen memory usage go down on all servers from a consistent 98% to 15%. Memory graphs show expected sawtoothing of memory being used and collected. Overall performance has gone from an average page request time of 1200ms to 54ms without all these requests running rampant behind the scenes.

Still I hope Adobe is able to identify and fix the problem.