Large String in Large Object Heap causes issues - but in any case it has to end up as a String

2.2k views Asked by At

I am following up from this question here

The problem I have is that I have some large objects coming from an MSMQ mainly Strings. I have narrowed down my memory problems to these objects being created in the Large Object Heap (LOH) and therefore fragmenting it (confirmed that with some help from the profiler).

In the question I posted above I got some workarounds mainly in the form of splitting up the String into char arrays which I did.

The problem I am facing is that at the end of the string processing (in whatever form that is) I need to send that string to another system which I have no control over. So I was thinking of the following solution to have this String placed in the LOH:

  1. Represent it as an array of char arrays less than 85k each (threshold of Objects to be placed in the LOH)
  2. Compress it on the sender end (i.e. before receiving it in the system we are talking about here which is the receiver) and decompress it only before passing it in the third party system.

Whatever I do - one way or another - the String will have to be complete (no char arrays or compressed).

Am I stuck here? I am thinking if using a managed environment was a mistake here and whether we should bite the bullet and go for a C++ kind of environment.

Thanks, Yannis

EDIT: I have narrowed down the problem to exactly the code posted here

The large string that comes through is placed in the LOH. I have removed every single processing module from point where i have received the message onwards and the memory consumption trend remains the same.

So I guess i need to change the way this WorkContext is passed around between systems.

3

There are 3 answers

2
Jens On BEST ANSWER

You maybe could implement a class (call it LargeString), that reuses previously assigned strings and keeps a small collection of them.

Since strings normally are immutable, you'd have to do every change and new assignment by unsafe pointer juggling. After passing a string to the reciever, you'd need to manually mark it as free for reuse. Different message lengths might also be a problem, unless the reciever can cope with messages that are too long, or you have a collection of strings of every length.

Probably not a great idea, but maybe beats rewriting everything in C++.

0
xanatos On

You can try streaming the values using a StringBuilder (the 4.0 version that uses a rope-like implementation).

This example must be executed in Release mode and with the Start Without Debugging attached (CTRL-F5). Both Debug mode and Start Debugging mess too much with the GC.

public class SerializableWork
{
    // This is very often between 100-120k bytes. This is actually a String - not just for the purposes of this example
    public String WorkContext { get; set; }

    // This is quite large as well but usually less than 85k bytes. This is actually a String - not just for the purposes of this example
    public String ContextResult { get; set; }
}

class Program
{
    static void Main(string[] args)
    {
        Console.WriteLine("Initial memory: {0}", GC.GetTotalMemory(true));
        var sw = new SerializableWork { WorkContext = new string(' ', 1000000), ContextResult = new string(' ', 1000000) };
        Console.WriteLine("Memory with objects: {0}", GC.GetTotalMemory(true));

        using (var mq = new MessageQueue(@".\Private$\Test1"))
        {
            mq.Send(sw);
        }

        sw = null;

        Console.WriteLine("Memory after collect: {0}", GC.GetTotalMemory(true));

        using (var mq = new MessageQueue(@".\Private$\Test1"))
        {
            StringBuilder sb1, sb2;

            using (var msg = mq.Receive())
            {
                Console.WriteLine("Memory after receive: {0}", GC.GetTotalMemory(true));

                using (var reader = XmlTextReader.Create(msg.BodyStream))
                {
                    reader.ReadToDescendant("WorkContext");
                    reader.Read();

                    sb1 = ReadContentAsStringBuilder(reader);

                    reader.ReadToFollowing("ContextResult");
                    reader.Read();

                    sb2 = ReadContentAsStringBuilder(reader);

                    Console.WriteLine("Memory after creating sb: {0}", GC.GetTotalMemory(true));
                }
            }

            Console.WriteLine("Memory after freeing mq: {0}", GC.GetTotalMemory(true));

            GC.KeepAlive(sb1);
            GC.KeepAlive(sb2);
        }

        Console.WriteLine("Memory after final collect: {0}", GC.GetTotalMemory(true));
    }

    private static StringBuilder ReadContentAsStringBuilder(XmlReader reader)
    {
        var sb = new StringBuilder();
        char[] buffer = new char[4096];

        int read;

        while ((read = reader.ReadValueChunk(buffer, 0, buffer.Length)) != 0)
        {
            sb.Append(buffer, 0, read);
        }

        return sb;
    }
}

I read directly the Message.BodyStream of the message in an XmlReader and then I go to the elements I need and I read the data in chunks using XmlReader.ReadValueChunk

In the end nowhere I use string objects. The only big block of memory is the Message.

0
DanH On

Well your options depend on how the 3rd party system is receiving data. If you can stream to it somehow then you don't have to have it all in memory in one go. If that is the case then compressing (which will probably really help your network load if its easily compressible data) is great as you can decompress through a stream and punt it to the 3rd party system in chunks.

The same of course would work if you split your strings up to go below LoH threshold.

If not then I would still advocate splitting the payload on the MSMQ message, and then using a memory pool of prealloacted and reused byte arrays for the re-assembly before sending it to the client. Microsoft has an implementation you can use http://msdn.microsoft.com/en-us/library/system.servicemodel.channels.buffermanager.aspx

The final option I can think of, is to handle the msmq deserialisation in unmanaged code in C++ and create your own custom large block memory pool using placement new to deserialise the strings into that. You could keep it relatively simple by ensuring your pool buffers are sufficient for the longest message possible rather than trying to be clever and dynamic which is hard.