C# textFieldParser error

1.5k views Asked by At

I have the following code segment to read a csv file. I am having issues with reading lines that are not really standard. For example a line like

105,"XXX Bank Azerbaijan" CJSC,1078      ,AZ,Baku,"xxx street",Nasimi district

goes into catch field since the second field "XXX Bank Azerbaijan" CJSC has quotes not right next to the commas. However when I open this file in Excel, it does not have any problem and separates the fields correctly as:

105|XXX Bank Azerbaijan CJSC|1078|AZ|Baku|xxx street|Nasimi district

where I used | as column separator. Is there a way to have the same effect using TextFieldParser, else I will need to use a different csv reader.

  using (TextFieldParser parser = new TextFieldParser(fileName, Encoding.GetEncoding("windows-1252")))
  {
            parser.TextFieldType = FieldType.Delimited;
            parser.SetDelimiters(",");
            parser.TrimWhiteSpace = true;
            parser.HasFieldsEnclosedInQuotes = true;

            parser.ReadLine(); // Reads dummy header

            while (!parser.EndOfData)
            {
                try
                {
                    string[] fieldRow = parser.ReadFields();
                    T fieldsClass = new T();
                    fieldsClass.Initialize(fieldRow);
                    data.Add(fieldsClass);
                    rowCount++;
                }
                catch
                {
                    Console.WriteLine("Skipping line" + parser.ErrorLine);
                }

            }
}
1

There are 1 answers

2
David Jacobsen On

First off, if this is actually incorrectly formatted data, then your best bet would be to do what Hans Passant suggested and:

Best thing to do is to send the file back and get the programmer to fix the bug in his code. The only other thing you can do is fix the string yourself before you let the parser see it.

However, if this is correctly formatted as per whatever specifications were agreed upon way back when, then you can try setting parser.HasFieldsEnclosedInQuotes = false; This will get it to parse BUT it won't strip out the double quotes like in your sample excel import. It will also cause textfieldparser to parse "foo,bar" as '"foo' and 'bar"' instead of "foo,bar" (one field). This may be able to be worked around by specifying a different delimiter to be used by the source of the data, one that won't be found in the middle of field values.

It may be easier to change the specifications to use | as a field delimiter instead of , and deal with the double quotes in each field compared to changing the specifications to only allow double quotes immediately before and after a field delimiter.