Python: difflib comparing two word documents in Sentence level

146 views Asked by At

I have two document files, one is for Trainee and the other is for Master file. I need to compare the difference of said files in sentence level "not in paragraph level". Iam using Python library (Difflib), however Iam having a hard time comparing those documents in line to line comparison or word to word comparison on each sentence.

See below for Example: Trainee Document

The patient started the menses and had pain in the abdominal area. No pain while eating but pain in the right upper area. Has no vomiting and had fever a month back when woke up sweeting, had chills and no shortness of breath. She also has chest pain and coughing its hurt right side Abdominal area, no constant pain. Reported feel like Cramb but bad stomachache. Reported never received the birth control pills. Not received the covid-19 vaccination yet.

Master Document

Patient did not go to last visit due to working. She started her menses and has periumbilical abdominal pain. Pain started this morning, after waking up. Feeling lightheaded, dizzy, and weak yesterday. Denies any pain with eating. New abdominal pain is different than her previous abdominal pain. Feeling nauseated for about 2 weeks. Denies any trauma or sicknesses in the household. Has had green colored bowel movements for about 1 week. Has been trying to not vomit. Abdominal pain comes and goes, with severe episodes. Eating worsens her pain, drinking water alleviates her pain. Pain feels like cramping. Bumpy roads aggravate her pain.

Compare Result should be like this Result Compare

I tried using difflib Differ() and HTMLDiff() methods below. But the result is just highlighting the whole paragraph once it detected difference between the two text but not showing the difference on each sentence.

import argparse
import difflib
import sys

from pathlib import Path

def create_diff(Trainee_file: Path,Master_file:Path, html_output: Path None):
  file1 = open(Trainee_file).readlines()
  file2 = open(Master_file).readlines()

if html_output:
  result = difflib.HtmlDiff().make_file(file1,file2,Trainee_file.name,Master_file.name)
  with open(html_output,"w") as f:
    f.write(result)
else:
  result = difflib.unified_diff(file1,file2,Trainee_file.name,Master_file.name)
  sys.stdout.writelines(result)

def main():
  parser = argparse.ArgumentParser()
  parser.add_argument("Trainee_file_version")
  parser.add_argument("Master_file_version")
  parser.add_argument("--html", help="Specify HTML to write to")
  args = parser.parse_args()

  Trainee_file = Path(args.Trainee_file_version)
  Master_file = Path(args.Master_file_version)

  if args.html:
    output_file = Path(args.html)
  else:
    output_file = None    
    create_diff(Trainee_file,Master_file,output_file)


if __name__=="__main__":
  main()

I really want to solve this problem. I appreciate any help. Thank you in advance.

0

There are 0 answers