C# Can't read some characters from a file

172 views Asked by At

New to c# here, I have done some research about this problem but couldn't find anything, lack of vocabulary maybe.

My task here is to a read a huge file and to extract only the lines which are following the conditions.

Code I'm using to test some things:

using (StreamReader sr = new StreamReader("SPDS_Test.doc"))
{
    while ((line = sr.ReadLine()) != null)
    {
        try
        {
            if (line.Contains("R  ") | line.Contains("E  "))
            {
                data = line;
                data = data.Remove(0, 1);
                data= data.Replace(" ", "").Replace("N", "").Replace("+", ",").Replace("·", ",").Replace("?", ",").Replace("(", "").Replace(")", "");
                Data.Add(data);
            }
        }
        catch (Exception e)
        {
            Console.WriteLine("--------", e);
            Console.WriteLine("--------Press any to continue---------");
            Console.ReadKey();
        }
    }

    foreach (string d in Data)
    {
        Console.WriteLine(d);
        Console.ReadKey();
    }
}

This is a part of the file :

R    XRPA168VC 
B    A 
L    手动紧急停堆 
E    XRPA300KS 
A    反应堆停堆 汽轮机停机

R    XRPR111VR 
B    IP 
E    F2/3(XRPR144KS, XRPR145KS, XRPR146KS)

What I noticed is that the letters aren't even letter if there chinese around it, for example I tried the condition line.Substring(0,1) == "R", it couldn't find those lines.

No matter what I do, my codes would only return this

XPR111VR
F2/3XRPR144KS, XRPR145KS, XRPR146KS

I really need to be able to extract every R and E lines.

2

There are 2 answers

0
MisterYUE On

I just tried to copy my whole doc into Notepad and put the encoding into UTF8, seems to work afterward but not sure if it's reliable.

1
Megha On

Try this...it works

    using (StreamReader sr = new StreamReader("SPDS_Test.doc"))
    {
           string line;
            string data;
             List<string> Data = new List<string>();
             while ((line = sr.ReadLine()) != null)
             {
                var utf8 = Encoding.UTF8;
                byte[] utfBytes = utf8.GetBytes(line);
                string myString = utf8.GetString(utfBytes, 0,
                utfBytes.Length);
                try
                {


                  if (myString.Contains("R ") || myString.Contains("E "))
                   {

                       data = line;
                       data = data.Remove(0, 1);
                       data= data.Replace(" ", "").Replace("N", 
                        "").Replace("+", ",").Replace("·", ",").Replace("?", 
                        ",").Replace("(", "").Replace(")", "");
                        Data.Add(data);
                   }
            }
            catch (Exception e)
            {
                Console.WriteLine("--------", e);
                Console.WriteLine("--------Press any to continue---------");
                Console.ReadKey();
            }
        }

        foreach (string d in Data)
        {
            Console.WriteLine(d);
            Console.ReadKey();
        }


    }