I have the following code segment to read a csv file. I am having issues with reading lines that are not really standard. For example a line like
105,"XXX Bank Azerbaijan" CJSC,1078 ,AZ,Baku,"xxx street",Nasimi district
goes into catch field since the second field "XXX Bank Azerbaijan" CJSC has quotes not right next to the commas. However when I open this file in Excel, it does not have any problem and separates the fields correctly as:
105|XXX Bank Azerbaijan CJSC|1078|AZ|Baku|xxx street|Nasimi district
where I used |
as column separator. Is there a way to have the same effect using TextFieldParser
, else I will need to use a different csv reader.
using (TextFieldParser parser = new TextFieldParser(fileName, Encoding.GetEncoding("windows-1252")))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
parser.TrimWhiteSpace = true;
parser.HasFieldsEnclosedInQuotes = true;
parser.ReadLine(); // Reads dummy header
while (!parser.EndOfData)
{
try
{
string[] fieldRow = parser.ReadFields();
T fieldsClass = new T();
fieldsClass.Initialize(fieldRow);
data.Add(fieldsClass);
rowCount++;
}
catch
{
Console.WriteLine("Skipping line" + parser.ErrorLine);
}
}
}
First off, if this is actually incorrectly formatted data, then your best bet would be to do what Hans Passant suggested and:
However, if this is correctly formatted as per whatever specifications were agreed upon way back when, then you can try setting parser.HasFieldsEnclosedInQuotes = false; This will get it to parse BUT it won't strip out the double quotes like in your sample excel import. It will also cause textfieldparser to parse "foo,bar" as '"foo' and 'bar"' instead of "foo,bar" (one field). This may be able to be worked around by specifying a different delimiter to be used by the source of the data, one that won't be found in the middle of field values.
It may be easier to change the specifications to use | as a field delimiter instead of , and deal with the double quotes in each field compared to changing the specifications to only allow double quotes immediately before and after a field delimiter.