HttpClient resulting in leaking Node<Object> in mscorlib

4.4k views Asked by At

Consider the following program, with all of HttpRequestMessage, and HttpResponseMessage, and HttpClient disposed properly. It always ends up with about 50MB memory at the end, after collection. Add a zero to the number of requests, and the un-reclaimed memory doubles.

   class Program
    {
        static void Main(string[] args)
        {
            var client = new HttpClient { 
                   BaseAddress = new Uri("http://localhost:5000/")};

            var t = Task.Run(async () =>
            {
                var resps = new List<Task<HttpResponseMessage>>();
                var postProcessing = new List<Task>();

                for (int i = 0; i < 10000; i++)
                {
                    Console.WriteLine("Firing..");
                    var req = new HttpRequestMessage(HttpMethod.Get,
                                                        "test/delay/5");
                    var tsk = client.SendAsync(req);
                    resps.Add(tsk);
                    postProcessing.Add(tsk.ContinueWith(async ts =>
                    {
                        req.Dispose();
                        var resp = ts.Result;
                        var content = await resp.Content.ReadAsStringAsync();
                        resp.Dispose();
                        Console.WriteLine(content);
                    }));
                }

                await Task.WhenAll(resps);
                resps.Clear();
                Console.WriteLine("All requests done.");
                await Task.WhenAll(postProcessing);
                postProcessing.Clear();
                Console.WriteLine("All postprocessing done.");
            });

            t.Wait();
            Console.Clear();

            var t2 = Task.Run(async () =>
            {
                var resps = new List<Task<HttpResponseMessage>>();
                var postProcessing = new List<Task>();

                for (int i = 0; i < 10000; i++)
                {
                    Console.WriteLine("Firing..");
                    var req = new HttpRequestMessage(HttpMethod.Get,
                                                        "test/delay/5");
                    var tsk = client.SendAsync(req);
                    resps.Add(tsk);
                    postProcessing.Add(tsk.ContinueWith(async ts =>
                    {
                        var resp = ts.Result;
                        var content = await resp.Content.ReadAsStringAsync();
                        Console.WriteLine(content);
                    }));
                }

                await Task.WhenAll(resps);
                resps.Clear();
                Console.WriteLine("All requests done.");
                await Task.WhenAll(postProcessing);
                postProcessing.Clear();
                Console.WriteLine("All postprocessing done.");
            });

            t2.Wait();
            Console.Clear();
            client.Dispose();

            GC.Collect();
            Console.WriteLine("Done");
            Console.ReadLine();
        }
    }

On a quick investigation with a memory profiler, it seems that the objects that take up the memory are all of the type Node<Object> inside mscorlib.

My initial though was that, it was some internal dictionary or a stack, since they are the types that uses Node as an internal structure, but I was unable to turn up any results for a generic Node<T> in the reference source since this is actually Node<object> type.

Is this a bug, or somekind of expected optimization (I wouldn't consider a proportional consumption of memory always retained to be a optimization in any way)? And purely academic, what is the Node<Object>.

Any help in understanding this would be much appreciated. Thanks :)

Update: To extrapolate the results for a much larger test set, I optimized it slightly by throttling it.

Here's the changed program. And now, it seems to stay consistent at 60-70MB, for a 1 million request set. I'm still baffled at what those Node<object>s really are, and its allowed to maintain such a high number of irreclaimable objects.

And the logical conclusion from the differences in these two results leads me to guess, this may not really be an issue in with HttpClient or WebRequest, rather something rooted directly with async - Since the real variant in these two test are the number of incomplete async tasks that exist at a given point in time. This is merely a speculation from the quick inspection.

static void Main(string[] args)
{

    Console.WriteLine("Ready to start.");
    Console.ReadLine();

    var client = new HttpClient { BaseAddress = 
                    new Uri("http://localhost:5000/") };

    var t = Task.Run(async () =>
    {
        var resps = new List<Task<HttpResponseMessage>>();
        var postProcessing = new List<Task>();

        for (int i = 0; i < 1000000; i++)
        {
            //Console.WriteLine("Firing..");
            var req = new HttpRequestMessage(HttpMethod.Get, "test/delay/5");
            var tsk = client.SendAsync(req);
            resps.Add(tsk);
            var n = i;
            postProcessing.Add(tsk.ContinueWith(async ts =>
            {
                var resp = ts.Result;
                var content = await resp.Content.ReadAsStringAsync();
                if (n%1000 == 0)
                {
                    Console.WriteLine("Requests processed: " + n);
                }

                //Console.WriteLine(content);
            }));

            if (n%20000 == 0)
            {
                await Task.WhenAll(resps);
                resps.Clear();
            }

        }

        await Task.WhenAll(resps);
        resps.Clear();
        Console.WriteLine("All requests done.");
        await Task.WhenAll(postProcessing);
        postProcessing.Clear();
        Console.WriteLine("All postprocessing done.");
    });

    t.Wait();
    Console.Clear();
    client.Dispose();

    GC.Collect();
    Console.WriteLine("Done");
    Console.ReadLine();
}
3

There are 3 answers

0
StuS On

We had the same problems, when we use System.Net.WebRequest for doing some http-requests. Size of w3wp process had range 4-8 Gb, because we do not have a constant load. Sometimes we have 10 request per second and 1000 in other time. Of course buffer does not reused in same scenario.

We are change all place when used System.Net.WebRequest on System.Net.Http.HttpClient because it doesn't have any buffer pools.

If you have many request through your httpclient, make it as static variable for avoid Socket leaks.

enter image description here

I think that more simple way analyze this problem - use PerfView. This application can show reference tree so you can show root case of your problem.

enter image description here enter image description here

0
user10101 On

We encountered a similar issue with the PinnableBufferCache becoming too large and leading to OutOfMemoryException's.

enter image description here

Andrew Au's analysis stopped at the point that the cache is static "and is not released when all requests are done". But the more interesting question "Under what conditions it is released?" was still open.

According to the sources it is trimmed on Gen2 GC event together with some other conditions which are pretty tricky (e.g. not often that every 10 msec, etc): https://referencesource.microsoft.com/#System/parent/parent/parent/InternalApis/NDP_Common/inc/PinnableBufferCache.cs,203

My experiments have shown that if the process will survive the memory usage hype and a load (i.e. the number of HTTP requests) will decrease than the cache volume will decrease as well with time.

In our case, we found that we can greatly optimize the amount of content loaded via HTTP.

I think alternative solutions might be making more free virtual memory available for process or throttling a load when memory usage is too high.

1
Andrew Au On

Let’s investigate the problem with all the tools we have in hand.

First, let’s take a look at what those objects are, in order to do that, I put the given code in Visual Studio and created a simple console application. Side-by-side I run a simple HTTP server on Node.js to serve the requests.

Run the client to the end and start attaching WinDBG to it, I inspect the managed heap and get these results:

0:037> !dumpheap
Address       MT     Size
02471000 00779700       10 Free
0247100c 72482744       84     
...
Statistics:
      MT    Count    TotalSize Class Name
...
72450e88      847        13552 System.Collections.Concurrent.ConcurrentStack`1+Node[[System.Object, mscorlib]]
...

The !dumpheap command dumps all objects in the managed heap there. That could include objects that should be freed (but not yet because GC has not kicked in yet). In our case, that should be rare because we just called GC.Collect() before the print out and nothing else should run after the print out.

Worth notice is the specific line above. That should be the Node object you are referring to in the question.

Next, let’s look at the individual objects of that type, we grab the MT value of that object and then invoke !dumpheap again like this, this will filter out only the objects we are interested in.

0:037> !dumpheap -mt 72450e88   
 Address       MT     Size
025b9234 72450e88       16     
025b93dc 72450e88       16     
...

Now grabbing a random one in the list, and then asks the debugger why this object is still on the heap by invoking the !gcroot command as follow:

0:037> !gcroot 025bbc8c
Thread 6f24:
    0650f13c 79752354 System.Net.TimerThread.ThreadProc()
        edi:  (interior)
            ->  034734c8 System.Object[]
            ->  024915ec System.PinnableBufferCache
            ->  02491750 System.Collections.Concurrent.ConcurrentStack`1[[System.Object, mscorlib]]
            ->  09c2145c System.Collections.Concurrent.ConcurrentStack`1+Node[[System.Object, mscorlib]]
            ->  09c2144c System.Collections.Concurrent.ConcurrentStack`1+Node[[System.Object, mscorlib]]
            ->  025bbc8c System.Collections.Concurrent.ConcurrentStack`1+Node[[System.Object, mscorlib]]

Found 1 unique roots (run '!GCRoot -all' to see all roots).

Now it is quite obvious that we have a cache, and that cache maintain a stack, with the stack implemented as a linked list. If we ponder further we will see in the reference source, how that list is used. To do that, let’s first inspect the cache object itself, using !DumpObj

0:037> !DumpObj 024915ec 
Name:        System.PinnableBufferCache
MethodTable: 797c2b44
EEClass:     795e5bc4
Size:        52(0x34) bytes
File:        C:\WINDOWS\Microsoft.Net\assembly\GAC_MSIL\System\v4.0_4.0.0.0__b77a5c561934e089\System.dll
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
724825fc  40004f6        4        System.String  0 instance 024914a0 m_CacheName
7248c170  40004f7        8 ...bject, mscorlib]]  0 instance 0249162c m_factory
71fe994c  40004f8        c ...bject, mscorlib]]  0 instance 02491750 m_FreeList
71fed558  40004f9       10 ...bject, mscorlib]]  0 instance 025b93b8 m_NotGen2
72484544  40004fa       14         System.Int32  1 instance        0 m_gen1CountAtLastRestock
72484544  40004fb       18         System.Int32  1 instance 605289781 m_msecNoUseBeyondFreeListSinceThisTime
7248fc58  40004fc       2c       System.Boolean  1 instance        0 m_moreThanFreeListNeeded
72484544  40004fd       1c         System.Int32  1 instance      244 m_buffersUnderManagement
72484544  40004fe       20         System.Int32  1 instance      128 m_restockSize
7248fc58  40004ff       2d       System.Boolean  1 instance        1 m_trimmingExperimentInProgress
72484544  4000500       24         System.Int32  1 instance        0 m_minBufferCount
72484544  4000501       28         System.Int32  1 instance        0 m_numAllocCalls

Now we see something interesting, the stack is actually used as a free list for the cache. The source code tells us how the free list is used, in particular, in the Free() method shown below:

http://referencesource.microsoft.com/#mscorlib/parent/parent/parent/parent/InternalApis/NDP_Common/inc/PinnableBufferCache.cs

/// <summary>
/// Return a buffer back to the buffer manager.
/// </summary>
[System.Security.SecuritySafeCritical]
internal void Free(object buffer)
{
  ...
  m_FreeList.Push(buffer);
}

So that is it, when the caller is done with the buffer, it returns to the cache, the cache then put that in the free list, the free list is then used for allocation purpose

[System.Security.SecuritySafeCritical]
internal object Allocate()
{
  // Fast path, get it from our Gen2 aged m_FreeList.  
  object returnBuffer;
  if (!m_FreeList.TryPop(out returnBuffer))
    Restock(out returnBuffer);
  ...
}

Last but not least, let’s understand why the cache itself is not freed when we are done with all those HTTP requests? Here is why. By adding a breakpoint on mscorlib.dll!System.Collections.Concurrent.ConcurrentStack.Push(), we see the following call stack (well, this could be just one of the cache use case, but this is representative)

mscorlib.dll!System.Collections.Concurrent.ConcurrentStack<object>.Push(object item)
System.dll!System.PinnableBufferCache.Free(object buffer)
System.dll!System.Net.HttpWebRequest.FreeWriteBuffer()
System.dll!System.Net.ConnectStream.WriteHeadersCallback(System.IAsyncResult ar)
System.dll!System.Net.LazyAsyncResult.Complete(System.IntPtr userToken)
System.dll!System.Net.ContextAwareResult.Complete(System.IntPtr userToken)
System.dll!System.Net.LazyAsyncResult.ProtectedInvokeCallback(object result, System.IntPtr userToken)
System.dll!System.Net.Sockets.BaseOverlappedAsyncResult.CompletionPortCallback(uint errorCode, uint numBytes, System.Threading.NativeOverlapped* nativeOverlapped)
mscorlib.dll!System.Threading._IOCompletionCallback.PerformIOCompletionCallback(uint errorCode, uint numBytes, System.Threading.NativeOverlapped* pOVERLAP)

At WriteHeadersCallback, we are done with writing the headers, so we return the buffer to the cache. At this point the buffer is pushed back to the free list, and therefore we allocate a new stack node. The key thing to notice is that the cache object is a static member of HttpWebRequest.

http://referencesource.microsoft.com/#System/net/System/Net/HttpWebRequest.cs

...
private static PinnableBufferCache _WriteBufferCache = new PinnableBufferCache("System.Net.HttpWebRequest", CachedWriteBufferSize);
...
// Return the buffer to the pinnable cache if it came from there.   
internal void FreeWriteBuffer()
{
  if (_WriteBufferFromPinnableCache)
  {
    _WriteBufferCache.FreeBuffer(_WriteBuffer);
    _WriteBufferFromPinnableCache = false;
  }
  _WriteBufferLength = 0;
  _WriteBuffer = null;
}
...

So there we go, the cache is shared across all requests and is not released when all requests are done.