Brighthouse optimization in table structure for SELECT query

1k views Asked by At

I have a table with millions of records in MySQL -> imported into Infobright using notes in wiki pages, no problem!

Here is the table syntax

CREATE TABLE `myTable` (
  `a` varchar(255) COLLATE latin1_bin DEFAULT NULL,
  `b` varchar(255) COLLATE latin1_bin DEFAULT NULL,
  `c` bigint(20) NOT NULL,
  `d` bigint(20) NOT NULL,
  `e` int(10) NOT NULL
) ENGINE=BRIGHTHOUSE

now I need to run a select query 450 times, each time a different 'a' is being used a constraint as such:

SELECT d,e FROM `myTable` WHERE a = 'myString';

The goal is to speed up the amount of time all the queries in total is being called. But I am having problems with that. When I run the select query about 450 times, on average it takes 0.52 seconds each query!

When I run it through MySQL however, it takes about 1.7 milliseconds per query!

How can I optimize this to beat MySQL time? Would it require me to use the 'IN' clause, instead of the '=' with an additional selection of 'a' ontop of d,e? eg:

SELECT a,d,e FROM `myTable` WHERE a IN ('myString1','myString2'.... etc )
1

There are 1 answers

0
Noah On BEST ANSWER

Infobright excels at working with numeric data (specifically when it comes to query conditions). They do not support traditional BTREE indexes, as they use a "knowledge grid" meta-data system. When you run WHERE a = 'myString', you're forcing the engine to open up every packrow (usually 50,000 rows of data per packrow) and do a string comparison against each record.

MySQL does have support for the BTREE indexes, which will offer better performance for text-based query conditions like you've noted here. Given that you've said you need to run the query 450 times, I will assume that you have at least 450 unique values for `a`.

If you have a text column like that with a very small set of unique values (like... some sort of status indicator, for example), you would greatly benefit from defining the column as a LOOKUP column

`a` varchar(255) COLLATE latin1_bin DEFAULT NULL COMMENT 'lookup'

NOTE: This sort of table setup offers the best performance for low-cardinality columns, preferably with a small size allocation for varchar(), not 255.

For this set of queries, you would probably be better off sticking with MySQL, or re-working the data to convert these values to numeric data in Infobright.

Since this question is about 4 months old, I'm kinda just posting this as a reference for anybody else that stumbles upon it