I have a OData REST API application running on IIS, using http.sys as the web server. I notice for around 5-10% of the requests my clients are noticing high latency (> 300 ms), but when I look at server side application logs, I see I processed the request in less than 50 ms. My clients are NOT geographically far from the web server. I think there is some non-network bottleneck, but I don't know where to look.
So far, I ran an Http.Sys ETW trace on the machine to capture a single request with high client reported latency. Here are the results:
Client Reported Latency: 323 ms
Server application reported processing time: 38 ms
Http.Sys trace:
1.RecvReq: 2023-08-14T22:33:39.653675000Z (“Request received”)
2.Parse: 2023-08-14T22:33:39.653723100Z (“Parsed request with URI …”)
3.Deliver: 2023-08-14T22:33:39.653755300Z (“Delivered request to server application from request queue IISPOOL_<AppName>”)
4.Deliver: 2023-08-14T22:33:39.653811300Z (“Delivered request to server application from request queue IISPOOL_<AppName>”)
5.RecvResp: 2023-08-14T22:33:39.697113000Z (“Server application passed response”) (EntityChunkCount: 23, HeaderLength: 329, StatusCode: 200)
6.RecvBody: 2023-08-14T22:33:39.701603000Z (“Server application passed entity body”)
7.FastRespLast: 2023-08-14T22:33:39.859224500Z (“Server Application passed the last response”)
8.FastSend: 2023-08-14T22:33:39.859284800Z (“Queued last response for sending. Status code is 200”)
I don't know fully how to interpret these logs, so I am not finding them helpful. (If you know where I can find resources to help interpret these logs, please share that).
Do these logs reveal anything insightful, like there is some bottleneck in my application logic, or maybe on the client side? What additional thing you would you suggest I look at next? I also checked performance counters on my machine (CPU/Memory) and it was low.
You can record CPU and Network traces with ETW. Download ETWController:
Note: ETWController and ETWAnalyzer are written by me.
Start it as Administrator and select as Preset CSWitch
Then give the recording file a descriptive file name like
After that select Manual Editing to tweak the recording settings
Change the Profile to also record Network Profiling with this command line
Now press Start and let your clients have slow response times. This generates quite a bit of data (1-2 GB/minute) in a up to 4 GB Ring buffer so your time horizon of the recording is no more than 1-2 minutes. Be sure to stop recording soon enough after something bad did happen.
When the recording is finished you can analyze the ETW data with a command line tool called ETWAnalyzer.
Download and extract it to a folder and add the PATH to the folder it so you can execute the tool from anywhere.
Now you can extract the relevant data with
The first time you should use the
-symsver msoption to download Microsoft symbols for all Windows code. When you did not patch/update Windows in between you can omit this next time which will extract the data quicker.Now you can check TCP metrics with an IP filter of one of your client IPs
If there are showing up Retrans Count/%/Delay then TCP retransmissions did occur which indicate network overload. Your reported number of 300ms is the default TCP retransmission delay for Windows TCP sockets connected to the Internet. For more information see https://github.com/Siemens-Healthineers/ETWAnalyzer/blob/main/ETWAnalyzer/Documentation/DumpTCPCommand.md
If it is not the network you should consider using WPA (Windows Performance Analyzer) to look deeper inside IIS if there are some waits due to e.g. AV solutions. A generic query could help also like
to get some first indications if some Antivirus solution is interfering in one of your w3wp.exe processes.