I have a C# application where at one point it scans a folder that could potentially contain 10s of thousands of files. It the filters that list by name and length and selects a relatively small number for processing.
Simplified code:
DirectoryInfo directoryInfo = new DirectoryInfo(path);
FileSystemInfo[] fileSystemInfos = directoryInfo.GetFileSystemInfos();
List<MyInfo> myInfoList = fileSystemInfos
.Where(f => (f.Attributes & FileAttributes.Directory) != FileAttributes.Directory))
.Select(f => new MyInfo {
FilePath = f.FullName,
FileSize = new FileInfo(f.FullName).Length,
})
.ToList();
The logic later selects a handful of files and verifies a non-zero length.
The problem is that the individual calls to FileInfo(f.FullName).Length
are killing performance. Under the covers, I see that FileInfo internally stores a WIN32_FILE_ATTRIBUTE_DATA struct that contains length (fileSizeLow and fileSizeHigh), but does not exposes that as a property.
Question: Is there an simple alternative to the above that can retrieve file names and lengths efficiently without the extra FileInfo.Length call?
My alternative is to make the MyInfo.FileSize property a lazy load property, but I wanted to check for a more direct approach first.