I have a rule-based system with several 100Ks of facts, and I'm getting very poor performance with PyCLIPS just for loading the facts.
I've narrowed it down to a simple example with two templates and a single rule that joins them (and does nothing else):
import clips
import timeit
env = clips.Environment()
env.BuildTemplate('F1', '(slot x (type INTEGER))')
env.BuildTemplate('F2', '(slot x (type INTEGER))')
env.BuildRule('Rule1', '(F1 (x ?val)) (F2 (x ?val))', '')
N = 20000
with open('F1.txt', 'w') as f1:
with open('F2.txt', 'w') as f2:
for n in xrange(N):
print >>f1, '(F1 (x {}))'.format(n)
print >>f2, '(F2 (x {}))'.format(n)
print timeit.timeit(lambda : env.LoadFacts('F1.txt'), number=1)
print timeit.timeit(lambda : env.LoadFacts('F2.txt'), number=1)
Output:
0.0951321125031
14.6272768974
So the second batch of 20K facts takes 14.6 seconds to load. Loading the same fact files from the CLIPS console is instantaneous. Checking different values of N
reveals that the loading time is roughly proportional to sqr(N)
(making this completely unusable for large numbers of facts).
Switching the order of operations, and defining the rule after loading the facts does not make things better (obviously the last operation is always the slow one).
Is anyone familiar with this issue? Am I using PyCLIPS in a wrong way?
I am running PyCLIPS v1.0.7.348
and CLIPS v6.3
.
CLIPS 6.3 uses hashing in the joins that compare variables from one pattern to another. This can considerably improve performance when there are a large number of facts and rules similar to the one in your example. In prior versions of CLIPS, when a new F1 fact is asserted, iteration would occur across all F2 facts matching the second pattern (and a similar iteration would occur for each new F2 fact). In version 6.3, iteration occurs only on the facts hashed to the same bucket for the value of ?val. The Readme page on the PyCLIPS website indicates that it's compiled with CLIPS 6.24, so this would explain the difference in performance. Offhand I don't recall any significant API differences between 6.24 and 6.3, so it may be possible to recompile PyCLIPS with the newer version of CLIPS to get the performance improvements.