Composite index on three columns

1.5k views Asked by At

We have got 93M rows mapping table which is keeping mapping information of three tables and corresponding three tables. We are facing performance issues in accessing data from the mapping table.

TableName PK information Number of Rows
Seller SellerId Primary Key 3000 rows
Store StoreId Primary Key 20000 rows
Product ProductId Primary Key 200k rows
SellerStoreProductMapping SellerId,ProductId,StoreId Composite non-clustered index. There is also one surrogate key in this table: SellerStoreProductMappingId which is used as Primary Key clustered 93M rows

Our queries can access any of the three combination : Seller, Product, Store in the 93M rows table.

My actual query is like this:

SELECT < many columns from four tables>
FROM SellerStoreProductMapping
INNER JOIN Store
INNER JOIN Seller
INNER JOIN Product ...
WHERE SellerId = 123

But, what is happening is the non-clustered index: SellerId,ProductId,StoreId is rarely used, in our queries, even if we are having filter on SellerId. It goes for index scan for storeId.

Our doubt is, for these three column combination,

  1. should we go for separate non-clustered indexes (3 indexes) ?
  2. should we go for two column non-clustered composite indexes (4 indexes) ?
  3. should we go for three column composite non-clustered indexes (9 indexes). Instead of 9, We will limit to specific usage scenarios. ?

Note: We cannot create clustered column store index, as we are having ROWVERSION datatype as one of the datatypes in the mapping table.

3

There are 3 answers

2
SQLpro On BEST ANSWER

Your index will be used systematically if this index is covering all the information in the query and if predicate is sargable.

As an example, let us see those queries :

--1
SELECT *
FROM   SellerStoreProductMapping
WHERE  Seller = 1 AND Product = 2 AND Store = 1
-- 2
SELECT Seller, Product, Store
FROM   SellerStoreProductMapping
WHERE  Seller = 1 AND Product = 2 AND Store = 1
-- 3
SELECT anyOtherColumns
FROM   SellerStoreProductMapping
WHERE  Seller = 1 AND Product = 2 AND Store = 1
-- 4
SELECT Seller, Product, Store
FROM   SellerStoreProductMapping
WHERE  Seller = 1 AND Product = 2 AND Store = 1
ORDER  BY anyOtherColumns

Only query 2 will systematicaly use the index. All the other queries (1, 3, 4) does not have all the columns used in the query, in the index key... So they must use a double read :

  • first read (seek in the index) to find candidate rows that are qualified under the predicate
  • second read in the table to find the columns that the index does not have

The cost of the two reads is compare to the cost of other strategies like scanning the table. If the cost of the scan is lower, the index won't be used...

2
Charlieface On

A few rules of thumb to remember with regards to column order in indexes:

  • Place first in the index key any columns referenced in WHERE = predicate, and any columns referenced in unique joins (the join returns a single row from the other table)
  • Then any range predicates, such as > <= < <> BETWEEN
  • Then other join columns
  • Then any ordering columns
  • Don't forget to add INCLUDEs for any other columns you are selecting
  • If you have a choice between two columns on the same level, pick the one with the most selectivity first.

So if your query is on a single Seller and Store, but many Products, then you need an index (Seller, Store, Product) or (Store, Seller, Product), depending on how selective the columns are.

Further notes:

  • If you are selecting columns which are not included in the index, the compiler may decide to do a clustered index scan instead, as the cost of extra key lookups may be not worth it.

  • The compiler can detect a unique join table only if it can calculate that the join is on a unique row. So always make sure unique columns are declared as such, either with PRIMARY/UNIQUE KEY or with a unique index.

  • In your case, your join table has an extra surrogate primary key, which I believe is unnecessary, as the three other columns uniquely define the row. Even if you need it, you don't have to cluster on it. You can use the clustered index (which acts as an index with all columns included) completely separate from the PK.

5
Máté Gábor On

This may not be the answer (but I am not allowed to comment), still one point to consider: the order of the indices in the composite index.

If you define your composite index in the SellerStoreProductMapping table in this order: SellerId,StoreId,ProductId, then it can only be efficiently used for queries that filter for SellerID OR for (SellerId and StoreId) OR for (SellerId and StoreId and ProductId)

If you have every possible combinations of the three columns (7 possibilities) as filters in your queries, then you will probably need to define at least three separate indices.

Some reference to this can be found here: https://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys