CosmosDB - does distinct scan indexe for the field or scan all records?

194 views Asked by At

When there is a table with records of this format :

{ "productId" : "123" "productCategory" : "fmcg" }

There can be millions of such product records spread across a few hundred productCategory.

If I do a select distinct c.productCategory from c on this table, does it just do a scan on the index it has created for productCategory (is there such index??), or does it go through the millions of records to figure out the distinct productCategory?

2

There are 2 answers

0
Jerald Baker On BEST ANSWER

It looks like Microsoft has optimized the way Distinct, Group By, OFFSET LIMIT and JOIN operators work by leveraging the index better.

This feature was out around June 21.

More details:-

https://feedback.azure.com/d365community/idea/8d3cad4c-0e25-ec11-b6e6-000d3a4f0858

https://devblogs.microsoft.com/cosmosdb/introducing-a-new-system-function-and-optimized-query-operators/

1
NotFound On

At this moment Cosmos won't leverage your index to perform the DISTINCT in your query. It goes through all documents that match your filter and therefore will increase in RU linearly the more documents it'll have to scan.

It's however something that is being worked on according to the Azure Share Your Ideas board.