I was trying to read .xls files with Microsoft.ACE.OLEDB.12.0. I could able to read some of the files but some of them turned out to be html files with .xls extension and throw this error: "External table is not in the expected format."
These HTML files have a scheme like this:
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html;" charset="utf-8">
<meta name="ProgId" content="Excel.Sheet">
<meta name="Generator" content="Microsoft Excel 11">
<title>Document and Custom Property </title>
<!--[if gte mso 9]><xml><o:CustomDocumentProperties><o:BUSINESSGROUP dt:dt="string">CHANNELA</o:BUSINESSGROUP><o:BUSINESSGROUPID dt:dt="string">2</o:BUSINESSGROUPID></o:CustomDocumentProperties></xml><![endif]-->
</head>
<body>
...
</body>
</html>
I changed my connection string to
Provider=Microsoft.ACE.OLEDB.12.0; Data Source=file.xls;Extended Properties="HTML Import"
in order to read HTML files. However, it works very slow. Besides it skips <div>
or other tags and reads only from formal <td>
tags. But Excel 2013 can open such an HTML file very quickly and doesn't skip any HTML tag content.
How can I read such HTML files as Excel reads?