Everyone warns not to query against anything other than RowKey or PartitionKey in Azure Table Storage (ATS), lest you be forced to table scan. For a while, this has paralyzed me into trying to come up with exactly the right PK and RK and creating pseudo-secondary indexes in other tables when I needed to query something else.
However, it occurs to me that I would commonly table scan in SQL Server when I thought appropriate.
So the question becomes, how fast can I table scan an Azure Table. Is this a constant in entities/second or does it depend on record size, etc. Are there some rules of thumb as to how many records is too many to table scan if you want a responsive application?
The issue of a table scan has to do with crossing the partition boundaries. The level of performance you are guaranteed is explicity set at the partition level. therefore, when you run a full table scan, its a) not very efficient, b) doesn't have any guarantee of performance. This is because the partitions themselves are set on seperate storage nodes, and when you run a cross partition scan, you're consuming potentially massive amounts of resources (tieing up multiple nodes simultaneously).
I believe, that the effect of crossing these boundaries also results in continuation tokens, which require additional round-trips to storage to retrieve the results. This results then in reducing performance, as well as an increase in transaction counts (and subsequently cost).
If the number of partitions/nodes you're crossing is fairly small, you likely won't notice any issues.
But please don't quote me on this. I'm not an expert on Azure Storage. Its actually the area of Azure I'm the least knowledgeable about. :P