I'm trying to use the pyrouge to calculate the similarity between automated summary and gold standards. When it process both summaries, Rouge works ok. But when it writes the result, it complains that "tuple index out of range" Does anyone know what cause this problem, and how I can fix it?
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Set ROUGE home directory to D:\ComputerScience\Research\ROUGE-1.5.5\ROUGE-1.5.5.
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Writing summaries.
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Processing summaries. Saving system files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\system and model files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\model.
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Processing files in D:\ComputerScience\Research\summary\Grendel\automated.
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Processing automated.txt.
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Saved processed files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\system.
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Processing files in D:\ComputerScience\Research\summary\Grendel\manual.
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Processing BookRags.txt.
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Processing GradeSaver.txt.
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Processing GradeSummary.txt.
2017-09-13 23:54:57,557 [MainThread ] [INFO ] Processing Wikipedia.txt.
2017-09-13 23:54:57,562 [MainThread ] [INFO ] Saved processed files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\model.
Traceback (most recent call last):
File "<ipython-input-8-bc227b272111>", line 1, in <module>
runfile('D:/ComputerScience/Research/automate_summary.py', wdir='D:/ComputerScience/Research')
File "C:\Users\zhuan\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 707, in runfile
execfile(filename, namespace)
File "C:\Users\zhuan\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "D:/ComputerScience/Research/automate_summary.py", line 53, in <module>
output = r.convert_and_evaluate()
File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 361, in convert_and_evaluate
rouge_output = self.evaluate(system_id, rouge_args)
File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 331, in evaluate
self.write_config(system_id=system_id)
File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 315, in write_config
self._config_file, system_id)
File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 264, in write_config_static
system_filename_pattern = re.compile(system_filename_pattern)
File "C:\Users\zhuan\Anaconda3\lib\re.py", line 233, in compile
return _compile(pattern, flags)
File "C:\Users\zhuan\Anaconda3\lib\re.py", line 301, in _compile
p = sre_compile.compile(pattern, flags)
File "C:\Users\zhuan\Anaconda3\lib\sre_compile.py", line 562, in compile
p = sre_parse.parse(p, flags)
File "C:\Users\zhuan\Anaconda3\lib\sre_parse.py", line 855, in parse
p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
File "C:\Users\zhuan\Anaconda3\lib\sre_parse.py", line 416, in _parse_sub
not nested and not items))
File "C:\Users\zhuan\Anaconda3\lib\sre_parse.py", line 616, in _parse
source.tell() - here + len(this))
error: nothing to repeat
The gold standards are BookRags.txt, GradeSaver.txt, GradeSummary.txt, Wikipedia.txt
The summary that needs to be compared with is automated.txt
Shouldn't either *.txt or [a-z0-9A-Z]+ work? But the previous one gives me "nothing to repeat error", the latter "tuple index out of range" error
r = Rouge155("D:\ComputerScience\Research\ROUGE-1.5.5\ROUGE-1.5.5")
r.system_dir = 'D:\ComputerScience\Research\summary\Grendel\\automated'
r.model_dir = 'D:\ComputerScience\Research\summary\Grendel\manual'
r.system_filename_pattern = '[a-z0-9A-Z]+.txt'
r.model_filename_pattern = '[a-z0-9A-Z]+.txt'
output = r.convert_and_evaluate()
print(output)
I'm manually setting both directory. It seems like the Rouge package can process the txts in it.
The problem is that the rogue library never accounted for the case where no matches are found for your regular expression. The line in the rogue source code
id = match.groups(0)[0]
is the problematic one. If you look this up in the documentation it says the groups functionReturn a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern...
. Because no matches where found, an empty tuple was returned, and the code is trying to grab the first item from an empty tuple which results in an error.