I'm running Delphi RAD Studio XE2.
I have some very large files, each containing a large number of lines. The lines themselves are small - just 3 tab separated doubles. I want to load a file into a TStringList
using TStringList.LoadFromFile
but this raises an exception with large files.
For files of 2 million lines (approximately 1GB) I get the EIntOverflow
exception. For larger files (20 million lines and approximately 10GB, for example) I get the ERangeCheck
exception.
I have 32GB of RAM to play with and am just trying to load this file and use it quickly. What's going on here and what other options do I have? Could I use a file stream with a large buffer to load this file into a TStringList? If so could you please provide an example.
When Delphi switched to Unicode in Delphi 2009, the
TStrings.LoadFromStream()
method (whichTStrings.LoadFromFile()
calls internally) became very inefficient for large streams/files.Internally,
LoadFromStream()
reads the entire file into memory as aTBytes
, then converts that to aUnicodeString
usingTEncoding.GetString()
(which decodes the bytes into aTCharArray
, copies that into the finalUnicodeString
, and then frees the array), then parses theUnicodeString
(while theTBytes
is still in memory) adding substrings into the list as needed.So, just prior to
LoadFromStream()
exiting, there are four copies of the file data in memory - three copies taking up at worsefilesize * 3
bytes of memory (where each copy is using its own contiguous memory block + some MemoryMgr overhead), and one copy for the parsed substrings! Granted, the first three copies are freed whenLoadFromStream()
actually exits. But this explains why you are getting memory errors before reaching that point -LoadFromStream()
is trying to use 3-4 GB of memory to load a 1GB file, and the RTL's memory manger cannot handle that.If you want to load the content of a large file into a
TStringList
, you are better off usingTStreamReader
instead ofLoadFromFile()
.TStreamReader
uses a buffered file I/O approach to read the file in small chunks. Simply call itsReadLine()
method in a loop,Add()
'ing each line to theTStringList
. For example:Maybe some day,
LoadFromStream()
might be re-written to useTStreamReader
internally like this.