DataTable memory huge consumption

16.1k views Asked by At

I´m loading csv data from files into a datatable for processing.

The problem is, that I want to process several files and my tests with the datatable shows me huge memory consumption I tested with a 37MB csv file and the memory growed up to 240MB, which is way to much IMHO. I read, that there is overhead in the datatable and I could live with about 70MB in size , but not 240MB, which means it is six times the original size. I read here, that datatables need more memory than POCOs, but that the difference is way too much.

I put on a memory profiler and looked, if I have memory leaks and where the memory is. I found, that the datatablecolumns have between 6MB and 19MB filled with strings and the datatable had about 20 columns. Are the values stored in the columns? Why is so much memory taken, what can I do to reduce memory consumption. With this memory consumption datattables seem to be unusable.

Had somebody else such problems with datatables, or I´m doing something wrong?

PS: I tried a 70MB file and the datatable growed up to 500MB!

OK here is a small testcase: The 37MB csv-file (21 columns) let the memory grow up to 179MB.

    private static DataTable ReadCsv()
    {
        DataTable table = new DataTable();
        table.BeginLoadData();

        using (var reader = new StreamReader(File.OpenRead(@"C:\Develop\Tests\csv-Data\testdaten\test.csv")))
        {               
            int y = 0;
            int columnsCount = 0;
            while (!reader.EndOfStream)
            {
                var line = reader.ReadLine();
                var values = line.Split(',');

                if (y == 0)
                {
                    columnsCount = values.Count();
                    // create columns
                    for (int x = 0; x < columnsCount; x++)
                    {
                        table.Columns.Add(new DataColumn(values[x], typeof(string)));
                    }
                }
                else
                {
                    if (values.Length == columnsCount)
                    {
                        // add the data
                        table.Rows.Add(values);
                    }
                }

                y++;
            }

            table.EndLoadData();
            table.AcceptChanges();

        }

        return table;
    }
2

There are 2 answers

1
Nicholas Carey On BEST ANSWER

DataSet and its children DataTable, DataRow, etc. make up an in-memory relational database. There is a lot of overhead involved (though it does make [some] things very convenient.

If memory is an issue,

  • Build domain objects to represent each row in your CSV file with typed properties.
  • Create a custom collection (or just use IList<T> to hold them
  • Alternatively, build a light-weight class with the basic semantics of a DataTable:
    • the ability to select a row by number
    • the ability to select a column within a row by row number and either column name or number.
    • The ability to know the ordered set of column names
    • Bonus: The ability to select a column by name or ordinal number and receive a list of its values, one per row.

Are you sure you need an in-memory representation of your CSV files? Could you access them via an IDataReader like Sebastien Lorion's Fast CSV Reader?

7
Gary Walker On

DataTables are a generic solution of putting tablular data into memory and adding lots of table-related features. If the overhead is not acceptable for you have the option to 1) write your own DataTable class that eliminates the overhead that you don't need 2) Use an alternate representation that still accomplishes what you need, perhaps POCO based, or maybe an XMLDocument (May have just as much overhead maybe more, never really worried about it). 3) Stop trying to load everything into memory and just bring data in as needed from your external store.