Request time peaks, how to find bottleneck

501 views Asked by At

I'm trying to figure out the bottleneck(s) in a web application on a production server. There are about 80 sites running on ColdFusion 10 with the latest Java 8, Tomcat 7, IIS 8.5 on Windows Server 2012 R2. Requests randomly peak from average 500-1000 ms to up to 30 seconds, meaning 9 out of 10 requests of the same page are completed within 1000 ms and one takes 20-30 seconds. It can happen on pretty much every page/template with code, it seems.

Web app characteristics:

  • lots of regular expressions and string manipulation
  • lots of data transformation (list to array, recreating arrays, randomizing data order, generating structs)
  • lightweight SQL queries (dedicated SQL server in the same network)
  • no outgoing connections (<cfhttp>)

I'm monitoring using FusionReactor and I observed the following things:

Stacktrace of a slow request

Note: This is just an example. Other hanging requests show the gap in other functions. There doesn't seem to be a pattern, sometimes it's a function that contains a query, sometimes it's just a <cfinclude>, sometimes it's just plain output.

stacktrace

Thread CPU Time is 2500 ms while execution time is 29727 ms. TTFB and TTLB is 29726 ms, stream opened 2 ms before and closed 1 ms after. Query Total Time was 76 ms (19 queries).

Notice how funcCONSTRUCTFOOTER.getAccess is actually just a single line Java function return 0; (returning the access attribute of the <cffunction> tag), so I really doubt that the execution of this function took 23 seconds. I don't see any lock/wait either though.

More examples: stacktrace stacktrace stacktrace

JVM (Java HotSpot(TM) 64-Bit Server VM, 1.8.0_121)

-server
-Xms24G
-Xmx48G
-Xss4m
-XX:MaxMetaspaceSize=2G
-XX:+TieredCompilation
-XX:ReservedCodeCacheSize=2G
-XX:+UseCompressedOops
-XX:+UseG1GC
-XX:MaxGCPauseMillis=100
-Xbatch 

ColdFusion (10.0.22.283922, Update 22)

Maximum number of simultaneous Template requests: 40
Maximum number of cached templates: 40000 (~14500 cached according to monitor)
Trusted cache: true
Cache template in request: true
Component cache: true
Save class files: true

Connector

<Connector port="8012" protocol="AJP/1.3" redirectPort="8445" tomcatAuthentication="false" maxThreads="500" connectionTimeout="60000" />

worker.list=cfusion
worker.cfusion.type=ajp13
worker.cfusion.host=localhost
worker.cfusion.port=8012
worker.cfusion.connection_pool_size=500
worker.cfusion.max_reuse_connections=250
worker.cfusion.connection_pool_timeout=60

Average in metrics.log ranges from 35 to 75 threads. Busy count is about 90% of the value, e.g. Max threads: 500 Current thread count: 70 Current thread busy: 64.

Hardware

  • HyperV VM on Intel Xeon E5-2650
  • 4 physical cores assigned, 8 logical cores
  • 64 GB RAM assigned

What could be the reason? How can I inspect the issue further?

0

There are 0 answers