Does adding between speed up my SQL query?

160 views Asked by At

Let's say I have a table with 10M enries and when I have this query :

SELECT column_name, count(column_name)
FROM my_table
GROUP BY column_name

it takes a lot of time to complete. If I know that column_name will have a given range, and split my query into different queries where each of them looks like this:

SELECT column_name, count(column_name)
FROM my_table
WHERE column_name BETWEEN value_a AND value_b
GROUP BY column_name

I managed to make it respond faster. For example, by checking the time, if split that into 10 queries, it somewhat responds 10 times faster per query.

So, I guess, imposing that range makes my query go faster.

On another table though, with similar keys and columns, this is not the case. It takes the same time, no matter if I have the between or not.

Some things to note is :

  • first DB is an SQL , second one is an IBM DB
  • in the first case time is measured by a UI tool where I perform the query, in the second time by time in unix.
  • I am not comparing the times, I am just interested in whether using between can lead to reducing my query time?
  • the column_name is not the primary key, but it is part of the primary key (primary key is on multiple columns)

So, my question is, is this the case or it really depends? Does 'between' reduces the query time?

1

There are 1 answers

0
jkavalik On BEST ANSWER

My answer will be mostly about MySQL, but from what I found the BETWEEN operator is optimized too in DB2.

You need to have column_name as FIRST column in some index for BETWEEN to be able to use it in general case.

Where you do complete count/group by there mysql can use ANY index containing the column for full index scan because index is usually smaller (in datasize to read) than entire table.

As a result - if it can use index for BETWEEN then adding it means reading proportionally less rows which equals proportional speedup (not exactly, there are some other overheads). As you work with only one column from entire table, it is "optimal" case for most optimizations. Other queries may behave differently. But having index on most restrictive column really helps processing much less rows which translates to less disk operations etc. so makes query faster.

Learn to use EXPLAIN - there you can see indexes used and estimated numbers of rows to process among other info.