pyrouge tuple out of index

306 views Asked by At

I'm trying to use the pyrouge to calculate the similarity between automated summary and gold standards. When it process both summaries, Rouge works ok. But when it writes the result, it complains that "tuple index out of range" Does anyone know what cause this problem, and how I can fix it?

2017-09-13 23:54:57,524 [MainThread  ] [INFO ]  Set ROUGE home directory to D:\ComputerScience\Research\ROUGE-1.5.5\ROUGE-1.5.5.
2017-09-13 23:54:57,524 [MainThread  ] [INFO ]  Writing summaries.
2017-09-13 23:54:57,524 [MainThread  ] [INFO ]  Processing summaries. Saving system files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\system and model files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\model.
2017-09-13 23:54:57,524 [MainThread  ] [INFO ]  Processing files in D:\ComputerScience\Research\summary\Grendel\automated.
2017-09-13 23:54:57,524 [MainThread  ] [INFO ]  Processing automated.txt.
2017-09-13 23:54:57,539 [MainThread  ] [INFO ]  Saved processed files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\system.
2017-09-13 23:54:57,539 [MainThread  ] [INFO ]  Processing files in D:\ComputerScience\Research\summary\Grendel\manual.
2017-09-13 23:54:57,539 [MainThread  ] [INFO ]  Processing BookRags.txt.
2017-09-13 23:54:57,539 [MainThread  ] [INFO ]  Processing GradeSaver.txt.
2017-09-13 23:54:57,539 [MainThread  ] [INFO ]  Processing GradeSummary.txt.
2017-09-13 23:54:57,557 [MainThread  ] [INFO ]  Processing Wikipedia.txt.
2017-09-13 23:54:57,562 [MainThread  ] [INFO ]  Saved processed files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\model.
Traceback (most recent call last):

  File "<ipython-input-8-bc227b272111>", line 1, in <module>
    runfile('D:/ComputerScience/Research/automate_summary.py', wdir='D:/ComputerScience/Research')

  File "C:\Users\zhuan\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 707, in runfile
    execfile(filename, namespace)

  File "C:\Users\zhuan\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "D:/ComputerScience/Research/automate_summary.py", line 53, in <module>
    output = r.convert_and_evaluate()

  File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 361, in convert_and_evaluate
    rouge_output = self.evaluate(system_id, rouge_args)

  File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 331, in evaluate
    self.write_config(system_id=system_id)

  File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 315, in write_config
    self._config_file, system_id)

  File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 264, in write_config_static
    system_filename_pattern = re.compile(system_filename_pattern)

  File "C:\Users\zhuan\Anaconda3\lib\re.py", line 233, in compile
    return _compile(pattern, flags)

  File "C:\Users\zhuan\Anaconda3\lib\re.py", line 301, in _compile
    p = sre_compile.compile(pattern, flags)

  File "C:\Users\zhuan\Anaconda3\lib\sre_compile.py", line 562, in compile
    p = sre_parse.parse(p, flags)

  File "C:\Users\zhuan\Anaconda3\lib\sre_parse.py", line 855, in parse
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)

  File "C:\Users\zhuan\Anaconda3\lib\sre_parse.py", line 416, in _parse_sub
    not nested and not items))

  File "C:\Users\zhuan\Anaconda3\lib\sre_parse.py", line 616, in _parse
    source.tell() - here + len(this))

error: nothing to repeat

The gold standards are BookRags.txt, GradeSaver.txt, GradeSummary.txt, Wikipedia.txt The summary that needs to be compared with is automated.txt
Shouldn't either *.txt or [a-z0-9A-Z]+ work? But the previous one gives me "nothing to repeat error", the latter "tuple index out of range" error

r = Rouge155("D:\ComputerScience\Research\ROUGE-1.5.5\ROUGE-1.5.5")
r.system_dir = 'D:\ComputerScience\Research\summary\Grendel\\automated'
r.model_dir = 'D:\ComputerScience\Research\summary\Grendel\manual'
r.system_filename_pattern = '[a-z0-9A-Z]+.txt'
r.model_filename_pattern = '[a-z0-9A-Z]+.txt'
output = r.convert_and_evaluate()
print(output)

I'm manually setting both directory. It seems like the Rouge package can process the txts in it.

2

There are 2 answers

6
hostingutilities.com On BEST ANSWER

The problem is that the rogue library never accounted for the case where no matches are found for your regular expression. The line in the rogue source code id = match.groups(0)[0] is the problematic one. If you look this up in the documentation it says the groups function Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern.... Because no matches where found, an empty tuple was returned, and the code is trying to grab the first item from an empty tuple which results in an error.

0
Pri On

I had the same issue with pyrouge package. This issue is occurring because the source code is trying to match the filename that we provide, with a certain pattern, on failing which an empty tuple is returned. If you want to know more about this you can take a look at the Rouge155.py file. More specifically, check out the function __get_model_filenames_for_id() for instance.

I resolved it by following the exact filename instructions mentioned in the official page as given below:

r.system_filename_pattern = 'some_name.(\d+).txt'

r.model_filename_pattern = 'some_name.[A-Z].#ID#.txt'

So, my suggestion would be to:

  • Create two separate directories for system_summaries(system generated) and model_summaries(human generated/ Gold Standard)
  • Provide the exact file paths leading to these directories
  • If you are comparing one system_summary (say, SystemSummary.1.txt) to a set of model_summaries (say, ModelSummary.A.1.txt, ModelSummary.B.1.txt, ModelSummary.C.1.txt ), then provide the following pattern:
      r.system_filename_pattern = 'SystemSummary.(\d+).txt'

      r.model_filename_pattern = 'ModelSummary.[A-Z].#ID#.txt' 

You can extend this depending on the number of summaries you want to evaluate.

Hope this helps! Good Luck!