Hello Everyone and thanks for your attention.

(I have already googled my question and please understand that my question is different from those that asked for best way to read a LARGE excel file not many or for writing into Excel files, I just want to read them)

At the moment i am working on an small program that it's purpose can be defined as to reading and analyzing anywhere between a handful to a couple of hundred and possibly thousands of log files in excel format to collect statistical results, now i know the decision to save logs as Excel is odd, but i am not the one who decided that, thats just how it is for me.

as you might guess what i plan to do is to read data from every and each excel file and load it into memory as a C# model class before processing. The way i am reading the file excel files is based on my experience from using Interop, Opening an Excel file, reading data from few cells i need then close that file and move unto next and let another thread handle processing.

I have already tried testing my approach using Interop and it was super slow. Here is the part code that handeled the reading

                List<string> strings = new List<string>();
            Microsoft.Office.Interop.Excel.Application xlApp = null;
            Microsoft.Office.Interop.Excel.Workbook xlWorkBook = null;
            Microsoft.Office.Interop.Excel.Worksheet xlWorkSheet = null;
            Microsoft.Office.Interop.Excel.Range range = null;

            try
            {
                string str;
                int rCnt;
                int cCnt;
                int row = 0;
                int column = 0;

                xlApp = new Microsoft.Office.Interop.Excel.Application();
                xlWorkBook = xlApp.Workbooks.Open(path, 0, true, 5, "", "", true,
                    Microsoft.Office.Interop.Excel.XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
                xlWorkSheet = (Microsoft.Office.Interop.Excel.Worksheet)xlWorkBook.Worksheets.get_Item(1);

                range = xlWorkSheet.UsedRange;
                row = range.Rows.Count;
                column = range.Columns.Count;
                strings.Clear();

                for (rCnt = 1; rCnt <= row; rCnt++)
                {
                    for (cCnt = 1; cCnt <= column; cCnt++)
                    {
                        str = (range.Cells[rCnt, cCnt] as Microsoft.Office.Interop.Excel.Range).Value2?.ToString();
                        if (!string.IsNullOrEmpty(str))
                            strings.Add(str);
                    }
                }

                xlWorkBook.Close(true);
                xlApp.Quit();
            }
    catch (Exception exception)
            {
                exception.Log();
            }
            finally
            {
                //https://www.add-in-express.com/creating-addins-blog/2013/11/05/release-excel-com-objects/
                if (xlWorkBook != null) Marshal.FinalReleaseComObject(xlWorkBook);
                if (xlWorkSheet != null) Marshal.FinalReleaseComObject(xlWorkSheet);
                if (range != null) Marshal.FinalReleaseComObject(range);
                if (xlApp != null) Marshal.FinalReleaseComObject(xlApp);
            }
            return strings.ToArray();

So in order to resolve my problm i am hoping to find someone who faced similar problem in past and found a way around it or answer some questions i have:

  • What is the fastest free library when it comes to reading from multiple Excel files(as in it doesn't take too long opening and closing files or bother reading to the end of file or reading it from start each time when it goes to get a cell)
  • Would it improve the speed if i just loaded the entire excel file into memory and read it there and is there a Library that works like that?
  • Would it Improve speed if I loaded files that are in 2 different drives from 2 different thread?

1 Answers

0
Community On

Are you able to use nuget packages? If so, have a look at the ExcelDataReader library and its ExcelReaderFactory. https://www.nuget.org/packages/ExcelDataReader

For xls files, I would use var excelReader = ExcelReaderFactory.CreateBinaryReader(stream), then you can use the reader to read it into a dataset var data = reader.AsDataSet();

You can also tell it if the first row contains column names before you read the dataset, such as reader.IsFirstRowAsColumnNames = true;

If you've got .xlsx files, you can use ExcelReaderFactory.CreateOpenXmlReader(stream);