The C#/.NET application I am working on makes use of huge byte arrays and is having memory fragmentation issues. Checked memory usage using CLRMemory

The Code we use is as follows
PdfLoadedDocument loadedDocument = new PdfLoadedDocument("myLoadedDocument.pdf");
// Operations on pdf document
using (var stream = new MemoryStream())
{
loadedDocument.Save(stream);
loadedDocument.Close(true);
return stream.ToArray(); //byte[]
}
And we use similar code at multiple places across our application and we call this in loop for generating bulk audits ranging from a few 100's to 10000's
- Now is there a better way to handle this to avoild fragmentation
And as part of audits, we also download large files from Amazon S3 using the following code
using (var client = new AmazonS3Client(_accessKey, _secretKey, _region))
{
var getObjectRequest = new GetObjectRequest();
getObjectRequest.BucketName = "bucketName";
getObjectRequest.Key = "keyName";
using (var downloadStream = new MemoryStream())
{
using (var response = await client.GetObjectAsync(getObjectRequest))
{
using (var responseStream = response.ResponseStream)
{
await responseStream.CopyToAsync(downloadStream);
}
return downloadStream.ToArray(); //byte[]
}
}
}
- Is there a better alternative to download large files without them moving to LOH which is taking a toll with Garbage Collector
There's two different things here:
MemoryStream.ToArray()For what happens inside
MemoryStream: it is implemented as a simplebyte[], but you can mitigate a lot of the overhead of that by usingRecyclableMemoryStreaminstead via theMicrosoft.IO.RecyclableMemoryStreamnuget package, which re-uses buffers between independent usages.For
ToArray(), frankly: don't do that. When using vanillaMemoryStream, the better approach isTryGetBuffer(...), which gives you the oversized backing buffer, along with the start/end tokens:It is then your job to not look outside those bounds. If you want to make that easier: consider treating the segment as a span (or memory) instead:
This
TryGetBuffer(...)approach, however, does not work well withRecyclableMemoryStream- as it makes a defensive copy to prevent problems with independent data; in that scenario, you should treat the stream simply as a stream, i.e.Stream- just write to it, rewind it (Position = 0), and have the consumer read from it, then dispose it when they are done.As a side note: when reading (or writing) using the
StreamAPI: consider using the array-pool for your scratch buffers; so instead of:instead try:
In more advanced scenarios, it may be wise to use the pipelines API rather than the stream API; the point here is that pipelines allows discontiguous buffers, so you never need ridiculously large buffers even when dealing with complex scenarios. This is a niche API, however, and has very limited support in public APIs.