I am trying to implement the minimal finite state transducer described by Mihov and Maurel (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698) in Python3. My program is working but needs a lot of time, so I decided to use cProfile to determine the time-consuming parts. This was the result:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 103.844 103.844 {built-in method builtins.exec}
1 0.000 0.000 103.844 103.844 <string>:1(<module>)
1 0.095 0.095 103.843 103.843 fst.py:16(__init__)
3049 0.007 0.000 103.639 0.034 fst.py:154(find_minimized)
3049 3.388 0.001 103.427 0.034 fst.py:178(member)
2765927 5.173 0.000 99.582 0.000 {built-in method builtins.all}
52885795 54.989 0.000 94.861 0.000 fst.py:184(<genexpr>)
100430746 21.104 0.000 21.104 0.000 fst.py:217(transition)
94725124 18.815 0.000 18.815 0.000 fst.py:224(output)
2248 0.179 0.000 0.205 0.000 fst.py:190(copy_state)
3160 0.026 0.000 0.030 0.000 fst.py:251(clear_state)
2248 0.014 0.000 0.015 0.000 fst.py:140(new_state)
11888 0.011 0.000 0.013 0.000 fst.py:231(set_output)
7494 0.007 0.000 0.013 0.000 fst.py:265(left_string_division)
11593 0.007 0.000 0.009 0.000 fst.py:124(longest_common_prefix)
8498 0.007 0.000 0.009 0.000 fst.py:148(set_transition)
39099 0.006 0.000 0.006 0.000 {method 'add' of 'set' objects}
7118 0.003 0.000 0.004 0.000 fst.py:205(state_output)
50320 0.004 0.000 0.004 0.000 {built-in method builtins.len}
13660 0.003 0.000 0.003 0.000 fst.py:245(final)
9256 0.002 0.000 0.002 0.000 {method 'pop' of 'dict' objects}
3442 0.001 0.000 0.002 0.000 fst.py:238(set_final)
2248 0.001 0.000 0.001 0.000 {method 'keys' of 'dict' objects}
704 0.001 0.000 0.001 0.000 {method 'split' of 'str' objects}
3160 0.001 0.000 0.001 0.000 {method 'discard' of 'set' objects}
547 0.000 0.000 0.000 0.000 fst.py:211(set_state_output)
112 0.000 0.000 0.000 0.000 fst.py:131(new_temp_state)
112 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
So the time-consuming part seems to be the following function (which represents Lemma 3 in Mihov & Maurel) :
def member(self,state) :
for test_state in self.states - set(self.temp_states) :
if (((state in self.final_states and test_state in self.final_states) or \
(state not in self.final_states and test_state not in self.final_states)) and \
(not(state in self.final_states) or (self.state_output(state) == self.state_output(test_state))) and \
(all((self.transition(state,sym) == self.transition(test_state,sym) and \
self.output(state,sym) == self.output(test_state,sym)) for sym in self.sigma))) :
return test_state
return False
The following part of the if-condition seems to be responsible for the inefficiency:
(all((self.transition(state,sym) == self.transition(test_state,sym) and \
self.output(state,sym) == self.output(test_state,sym)) for sym in self.sigma))
So my question now is how I can look more efficiently for an equivalent state...