I'm trying to use FLANN on Python with a training set matrix of 6,000,000 x 100. I am building the index with the following commands:
flann = pyflann.FLANN()
params = flann.build_index(trainX, algorithm="autotuned", target_precision=0.9, sample_fraction=0.1, log_level="info")
I get lots of cross-validation results, the final lines shows this:
***Earlier results omitted***
KDTree using params: trees=32
Nodes Precision() Time(s) Time/vec(ms) Mean dist
---------------------------------------------------------
1 0.084 0.15658 0.15658 2.9036
2 0.098 0.15296 0.15296 2.097
4 0.137 0.15613 0.15613 1.6675
8 0.213 0.15454 0.15454 1.3968
16 0.324 0.16528 0.16528 1.2296
32 0.445 0.16481 0.16481 1.1253
64 0.567 0.20677 0.20677 1.0767
128 0.685 0.28533 0.28533 1.0437
256 0.777 0.42094 0.42094 1.027
512 0.863 0.73215 0.73215 1.0142
1024 0.916 1.2737 1.2737 1.0062
Start linear estimation
768 0.895 0.94626 0.94626 1.0086
896 0.903 1.073 1.073 1.0077
832 0.899 1.0092 1.0092 1.0081
KDTree buildTime=71.6163, searchTime=1.00915
----------------------------------------------------
Autotuned parameters:
algorithm : 1
trees : 8
----------------------------------------------------
Computing ground truth
Estimating number of checks
Nodes Precision() Time(s) Time/vec(ms) Mean dist
---------------------------------------------------------
1 0.445 3.053 3.053 2.0321
2 0.445 2.7963 2.7963 2.0321
4 0.457 3.2702 3.2702 1.514
8 0.475 3.0651 3.0651 1.321
16 0.503 3.0939 3.0939 1.1812
32 0.527 3.1058 3.1058 1.1132
64 0.57 3.6118 3.6118 1.0734
128 0.608 3.6885 3.6885 1.0467
256 0.642 3.3601 3.3601 1.0306
512 0.672 3.2895 3.2895 1.0187
1024 0.712 3.7659 3.7659 1.0087
2048 0.732 4.7198 4.7198 1.0047
4096 0.745 5.3416 5.3416 1.0027
8192 0.757 7.7298 7.7298 1.0012
16384 0.762 11.114 11.114 1.0006
32768 0.77 17.114 17.114 1.0001
65536 0.772 31.03 31.03 1
131072 0.772 61.822 61.822 1
262144 0.772 133.58 133.58 1
524288 0.773 284.51 284.51 1
1048576 0.773 605.73 605.73 1
2097152 0.773 1437.5 1437.5 1
4194304 0.773 3971.2 3971.2 1
8388608 0.773 6485.3 6485.3 1
16777216 0.773 6476.2 6476.2 1
33554432 0.773 6456.1 6456.1 1
67108864 0.773 6494.2 6494.2 1
It seems to me that a KDTree has been chosen (algorithm=1) with 8 trees. But now the last series of tests "Estimating number of checks" is stuck. It has been running overnight already but it hasn't finished. I think it's because the precision cannot reach my target of 0.9 so it just keeps increasing the nodes.
Can someone advise me on how to proceed? Should I just kill it and manually specify a kd-tree with the "Autotuned Parameters"? What is this "number of checks" that it is calculating anyway - I thought the number of trees and nodes have already been found?