TextWriter/StreamWriter high memory usage

1.6k views Asked by At

I have a console app that reads in a large text file with 40k+ lines, each line is a key that I use in a search for which the results are written to a output file. Issue is I leave this console app running for a while until it just suddenly closes and I realize that the process memory usage was really high was sitting at 1.6gb when I last saw it crash.

I looked around and didn't find many answers I did try to use the gcAllowVeryLargeObjects but that seems like I'm just dodging the problem.

Below is a snippet from my main() of where I write out to the file. I can't seem to understand why the memory usage gets so high. I flush the writer after every write (could it be because I'm keeping the file open for such a long period of time?).

TextWriter writer = new StreamWriter("output.csv", false));
foreach (var item in list)
 {
  Console.WriteLine("{0}/{1}", count, numofitem);
  var result = TableServiceContext.Read(p.id);
  if (result != null)
  {

   writer.WriteLine(String.Join(",", result.id,
   result.code,
   result.hash));

  }
  count++;
  writer.Flush();
 }
 writer.Close();

Edit: I have 32gb of ram on my computer so I am sure it's not running out of memory because I don't have enough ram.

Edit2: changed the name of the repository as that was misleading.

2

There are 2 answers

0
omikad On

If the average line length is 1KB then 40K lines is 40MB, and it nothing. That's why, I'm pretty sure problem is in your repository class. If it is EF repository, try to recreate DbContext for each line.

If you want to tune up your program, then, you can use the following method: Try to put timestamps to Console output, you can use Stopwatch class, and try to recreate your repository each 10 or 100 or N lines. Then, looking at timestamps, you can find optimal N to use.

var timer = Stopwatch.StartNew();
...
Console.WriteLine(timer.ElapsedMilliseconds);
0
DrKoch On

From looking at the code I think the problem isN't the Streamwriter but some memory leak in your repository. Suggestions to check:

  • replace the repository by some dummy e.g. class dummy_repository with just the three properties id, value, hash.
  • likewise create a long "list" e.g. 40k small entries.
  • run your program and see if it still consumes memory (I am pretty sure it will not)
  • then step by step add back your original parts. See what step causes the memory leak.