I am making a research project on search engines, and i am having problems with a performance of the for loop. I have the following problem:
for value in hash_array.keys():
cell= db_conn.use_client().hql_query(db_conn.use_namespace(),'SELECT doc_text FROM SE_doc_text WHERE ROW=\"'+
value+"\" SCAN_AND_FILTER_ROWS LIMIT 1").cells
list_result[value].append(cell[0].value)
Using hypertable i want to have the best performance, but when i am using it on 20.000 or 30.000 elements of the hash_array.keys() i have a result like 5-6 min, so that's slow really slow. Any other way of performing a fast for? or should i use my plan b for the whole search engine.
Performance of the query:
Elapsed time: 0.10 s
Avg value size: 1428.00 bytes
Avg key size: 57.00 bytes
Throughput: 595190.38 bytes/s
Total cells: 1
Throughput: 400.80 cells/s
Tested on hardware:
8gb 1600mhz
1055t amd
Additional facts:
Without SCAN_AND_FILTER_ROWS: 1.78163504601ms
With SCAN_AND_FILTER_ROWS: 3.27163504601ms
Only for loop: 0.0630ms
With the solution (from the answer) i have the following:
stringRow = ' AND '.join(["ROW = \""+value + "\"" for value in hash_array.keys()])
cell = db_conn.use_client().hql_query(db_conn.use_namespace(),stringBatch+stringRow).cells
Results: 0.355320930481ms for 2097 documents
1.0214779377 for 3565 documents
It is good but it's not best solution.
I've never used Hypertable but simply reading the documentation suggest the
SCAN_AND_FILTER_ROWS
clause may be a problem:This clause may become more appropriate if you send batches of keys at once.
Reading more documentation suggests batching would be possible with the following construct:
So, construct a query with a disjunction of, say, 100 separate keys:
ROW=x OR ROW=y OR...
.