I have to deal with very big Excel file. this file contains historical client data and I need to convert the Excel worksheet data into data table object. the data table object will passed to the another process for parsing.

I need a very fast way of converting worksheet data into data table. Any Idea about it?

I am currently using Aspose but it take around 23 Minutes to do the conversion. I try the OleDb it was very fast but it just work with physical file and required file path.my file data available as array of bytes. I don't want to write them on disk and use physical path in OleDB connection.

I have tried OpenXml but it was very slow (it took more then hour)

Using Aspose:

Workbook workbook = null;
loadOptions.MemorySetting = MemorySetting.MemoryPreference;
using (Stream stream = new MemoryStream(file))
{
   workbook = new Workbook(stream, loadOptions);
}

var worksheet = workbook.Worksheets[0];

DataTable dataTable = worksheet.Cells.ExportDataTableAsString(
                        0,
                        0,
                        worksheet.Cells.MaxDataRow + 1,
                        worksheet.Cells.MaxDataColumn + 1,
                        true);

took around 23 minutes

Using OleDB:

var fileName = "e:\\test1.xlsx";
var query = "SELECT * FROM [Data$]";
using (OleDbConnection cn = new OleDbConnection { ConnectionString = this.ConnectionString(fileName, "No") })
{
    using (OleDbCommand cmd = new OleDbCommand { CommandText = query, Connection = cn })
    {
         cn.Open();

         OleDbDataReader dr = cmd.ExecuteReader();
                dt.Load(dr);
     }
}

took around 2.5 minutes

I need a fast way to convert excel worksheet into data table for large Excel file (250 MB) less than 2.5 Min. The Excel file data available in Stream object.

2 Answers

0
Ahsan Iqbal On

You may try using LightCells feature of Aspose.Cells where large files can be read in short time and then device own logic to fill the DataTable. Regarding the slow performance of ExportDataTableAsString(), this is already under consideration by Aspose.Cells team.

https://docs.aspose.com/display/cellsnet/Using+LightCells+API#UsingLightCellsAPI-ReadingLargeExcelFiles:Example

Note: I am working as Support developer/ Evangelist at Aspose.

0
Ahsan Iqbal On

Please use LightCells API to read data and export it to datatable. Remove all code about Aspose.Cells to export data to datatable as it takes long time to export.

DateTime startTime = DateTime.Now;
DateTime finishTime = DateTime.Now;
int columns = 442;
int rows = 181872;
DataTable dt = new DataTable();
string[] values = new string[columns];
for (int i = 0; i < columns; i++)
{
    values[i] = "test" + i;
    dt.Columns.Add(values[i]);
}
for (int row = 0; row < rows; row++)
{
    DataRow dataRow = dt.NewRow();
    dt.Rows.Add(dataRow);
    for (int col = 0; col < columns; col++)
    {
        dataRow[col] = values[col];
    }
}
finishTime = DateTime.Now;
Console.WriteLine("load Excel worksheet data into Data table: (Aspose) " + (finishTime - startTime));