Comparing two text files and subtracting respective values

1.1k views Asked by At

I have 2 text files default.txt and current.txt.

default.txt:

ab_abcdefghi_EnInP005M3TSub.csv FMR: 0.0009 FNMR: 0.023809524 SCORE: -4  Conformity: True
ab_abcdefghi_EnInP025M3TSub.csv FMR: 0.0039 FNMR: 0 SCORE: -14  Conformity: True
ab_abcdefghi_EnInP050M3TSub.csv FMR: 0.01989 FNMR: 0 SCORE: -18  Conformity: True
ab_abcdefghi_EnInP075M3TSub.csv FMR: 0.0029 FNMR: 0 SCORE: -17  Conformity: True
ab_abcdefghi_EnInP090M3TSub.csv FMR: 0.0002 FNMR: 0 SCORE: -7  Conformity: True

current.txt looks like this

ab_abcdefghi_EnUsP005M3TSub.csv FMR: 0.0041 FNMR: 0 SCORE: -14  Conformity: True
ab_abcdefghi_EnUsP025M3TSub.csv FMR: 0.00710000000000001 FNMR: 0 SCORE: -14  Conformity: True
ab_abcdefghi_EnUsP050M3TSub.csv FMR: 0.0287999999999999 FNMR: 0 SCORE: -21  Conformity: True
ab_abcdefghi_EnUsP090M3TSub.csv FMR: 0.0113 FNMR: 0 SCORE: -23  Conformity: True

What i need to do is to subtract values of current from default (default-current).

E.g:

FMR_DIFF = FMR(default) - FMR(test)
FNMR_DIFF = FNMR(default) - FNMR(test)
SCORE_DIFF = SCORE(default) - SCORE(test)

I need to output this in a text file with output looking something like this

O/P:

result:   005M3TSub FMR_DIFF: -0.0032 FNMR_DIFF: 0.023809524 SCORE_DIFF: 10

I am trying to do this in C#. So far i have tried reading lines in both files. I was able to compare them. I cannot comprehend the logic i need to implement. I am very new to programming. Any help is appreciated.

3

There are 3 answers

0
A.P.S On BEST ANSWER

It is an interesting problem. Please check the solution. It is not optimized properly.

Firstly, we create a simple FileStructure Class to represent the String:

public class DefaultFileStructure
    {
        public string FileId;
        public decimal FMR;
        public decimal FNMR;
        public int Score;
        public bool Conformity;
    }

define the constant keyname for parsing .

private static string DEFAULT_KN = "tv_rocscores_DeDeP";
private static string TEST_KN    = "tv_rocscores_FrFrP";

Now, Parse the file and store the data in list structure.

    private List<DefaultFileStructure> GetFileStructure(string filePath, string keyName)
            {
                List<DefaultFileStructure> _defaultFileStructure = new List<DefaultFileStructure>();

                if(!File.Exists(filePath))
                {
                    Console.WriteLine("Error in loading the file");               
                }else{
                    string[] readText = File.ReadAllLines(filePath);
                    foreach (string s in readText)
                    {
                        _defaultFileStructure.Add(ParseLine(s, keyName));                    
                    }
                }

                return _defaultFileStructure;
            }

private DefaultFileStructure ParseLine(string Line, string Keyname)
        {
            DefaultFileStructure _dFileStruc = new DefaultFileStructure();

            string[] groups = Line.Split(new[] { ' ', ' ' },StringSplitOptions.RemoveEmptyEntries);

            /* -- Format Strucure, if the log provide same format always..
               Can also implement Expando concepts of C# 5.0 ***
                0[tv_rocscores_DeDeP005M3TSub.csv]
                1[FMR:]
                2[0.0009]
                3[FNMR:]
                4[0.023809524]
                5[SCORE:]
                6[-4]
                7[Conformity:]
                8[True]
             */

            _dFileStruc.FileId = groups[0].Replace(Keyname, "");
            _dFileStruc.FMR = decimal.Parse(groups[2]);
            _dFileStruc.FNMR = decimal.Parse(groups[4]);
            _dFileStruc.Score = int.Parse(groups[6]);
            _dFileStruc.Conformity = bool.Parse(groups[8]);

            return _dFileStruc;
        }

To match the difference and get the defined result as per your question.

 public void getDiff(String FirstFile, string SecondFile, string ResultFile)
        {
            try
            {
                //check if file exits....
                if (!File.Exists(FirstFile)) { return; }
                if (!File.Exists(SecondFile)) { return; }

                //Keep the result String..
                StringBuilder ResultBuilder = new StringBuilder();

                //Get the List of default file.
                List<DefaultFileStructure> DefaultList = GetFileStructure(FirstFile, DEFAULT_KN);

                //Get the List of test file.
                List<DefaultFileStructure> TestList = GetFileStructure(SecondFile, TEST_KN);


                //Get the diff and save in StringBuilder.
                foreach (DefaultFileStructure defFile in DefaultList)
                {
                    bool checkALL = false;
                    foreach (DefaultFileStructure testFile in TestList)
                    {
                        //Compare the file for diff.
                        if (defFile.FileId == testFile.FileId)
                        {
                            checkALL = false;
                            ResultBuilder.AppendLine(String.Format("result: {0} FMR_DIFF: {1} FNMR_DIFF: {2} SCORE_DIFF: {3}", defFile.FileId, defFile.FMR - testFile.FMR, defFile.FNMR - testFile.FNMR, defFile.Score - testFile.Score));
                            break;
                        }
                        else
                        {
                            checkALL = true;                      
                        }                        
                    }
                    if (checkALL == true)
                    {
                        ResultBuilder.AppendLine(String.Format("result: {0} FMR_DIFF: {1} FNMR_DIFF: {2} SCORE_DIFF: {3}", defFile.FileId, "N/A", "N/A", "N/A"));

                    }
                }

                //File processing completed.
                using (StreamWriter outfile = new StreamWriter(ResultFile))
                {
                    outfile.Write(ResultBuilder.ToString());
                }
            }
            catch (Exception ex)
            {
                throw ex;
            }
        }

Call the following method.

 getDiff(@"I:\Default_DeDe_operational_points_verbose.txt",
         @"I:\FrFr_operational_points_verbose.txt", 
         @"I:\Result.txt");

Thanks, Ajit

6
CodeCaster On

In order to compare the values, you'll first have to parse them. You can create a class that represents a single line of (False / Non-False) MatchRates:

public class MatchRateLine
{
    public int LineNumber { get; set; }

    public decimal FMR { get; set; }
    public decimal FNMR { get; set; }
    public int Score { get; set; }
    public bool Conformity { get; set; }
}

Then in your parser you can have a method like this:

public List<MatchRateLine> ParseFile(string filename)
{
    var result = new List<MatchRateLine>();

    using (var reader = new StreamReader(filename))
    {
        string line;
        while ((line = reader.ReadLine()) != null)
        {
            result.Add(ParseLine(line));
        }
    }

    return result;
}

And one way to do the actual parsing is this:

public MatchRateLine ParseLine(string line)
{
    var result = new MatchRateLine();

    int fmrPosition = line.IndexOf("FMR: ");
    int fmnrPosition = line.IndexOf("FMNR: ");

    string fmrValueString = line.Substring(fmrPosition, fmnrPosition - fmrPosition);
    decimal fmrValue;
    if (decimal.TryParse(fmrValueString, out fmrValue))
    {
        result.FMR = fmrValue;
    }

    // repeat for other values

    return result;
}

In the parser I have defined "A line's FMR value" being "the text between 'FMR: ' and 'FMNR: ', parsed as decimal". You'll have to apply this logic for each value you want to extract.

Now when you have two collections, you can loop over them and compare their values and whatnot:

var defaultLines = Parser.ParseFile("default.txt");
var currentLines = Parser.ParseFile("current.txt");

Your actual question though seems to be that you probably want to compare specific lines in default and current, but you're having trouble identifying lines that belong to each other. As seen with ab_abcdefghi_EnInP090M3TSub in your default on line 5, and ab_abcdefghi_EnUsP090M3TSub in current it's on line 4 (note In/Us).

For this you can extend the MatchRateLine class with a property, where you store the filename or a substring thereof by its meaning so you can find unique lines in both lists by this value.

You can again use the Substring() method for this, in the ParseLine() method:

// Position:  0123456789012345678901234567890
// Filename: "ab_abcdefghi_EnInP090M3TSub.csv"

result.ReportCode = line.Substring(17, 6);

This will cause the resulting MatchRateLine to have a ReportCode property with the value P090M3.

Given the two lists of lines again:

var p090m3DefaultLine = defaultLines.First(l => l.ReportCode == "P090M3");
var p090m3CurrentLine = currentLines.First(l => l.ReportCode == "P090M3");

var fmrDiff = p090m3DefaultLine.FMR - p090m3CurrentLine.FMR;

Please note this code does a lot of assumptions on the format and can throw exceptions when the line being parsed doesn't match that format.

1
Nicolas R On

You have to specify which lines must be in output: every "file".csv from default? every from current? Both (if one is missing in one of the 2 files, the output must still contain this csv)?

Once you know that, you can implement your logic:

  • Create a class (named FileLine for example) with the properties of a line, that is to say: a string for name (name of this string: CsvName), a decimal for FMR (say FmrValue), a decimal for FNMR (FnmrValue), an int for SCORE (ScoreValue)
  • Create a method for the process. It will:

  • Check the structure of the Current file: if not valid, stop process

  • Create a new List called defaultLines
  • Create a new List called currentLines
  • Create a string called processedLine (will be used in future step)
  • Read Default file: foreach line, create a FileLine, parse the line and implement the properties of your FileLine, and add the fileLine to the list (defaultLines)
  • Read current file: foreach line, create a FileLine, parse the line and implement the properties of your FileLine, and add the fileLine to the list (currentLines)
  • Then process the comparison (see after)

    public void comparisonGenerator() {

    // HERE: add currentFile check
    
    // Initialization
    List<FileLine> defaultLines = new List<FileLine>();
    List<FileLine> currentLines = new List<FileLine>();
    
    // HERE: add file reading to populate defaultLines and currentLines 
    
    
    // Comparison
    foreach(FileLine item in defaultLines)
    {
        // for the item with the same name (using Linq, you could do it easily):
        FileLine cLine = currentLines.Single(l => l.CsvName.Equals(item.CsvName));
        if(cLine != null)
        {
            processedLine = String.Format("result: {0} FMR_DIFF: {1} FNMR_DIFF: {2} SCORE_DIFF: {3}", item.CsvName, item.FmrValue - cLine.FmrValue, item.FnmrValue - cLine.FnmrValue, item.ScoreValue - cLine.ScoreValue);
            // HERE: add this line to future output
        }
    }
    
    // When all lines are processed, write the output to a file using FileStream
    

    }