Suppose I have three columns, A, B, C. They each have a range of x, y and z possible values respectively.
Does an index on all three columns have a size proportional to x * y * z?
Suppose I have three columns, A, B, C. They each have a range of x, y and z possible values respectively.
Does an index on all three columns have a size proportional to x * y * z?
No. The size of an
INDEX
is (roughly)N = Number of rows in the entire table.
L = Length (in bytes) of the values in all the columns of the index, plus columns in the
PRIMARY KEY
.overhead = various pointer, lengths, padding, etc
Example:
CREATE TABLE ... id INT PRIMARY KEY, A INT, INDEX(A) ...
INT
is a 4-byte datatype. It can hold more than 4 billion distinct values. If there are 100 rows in the table, let's look at the BTree holding the secondaryINDEX(A)
.N * L = 800, but once the overhead is added, and use the blocking, it will take 16KB. (Note: InnoDB allocates data and indexes in "blocks" of 16KB.)
Now add to that table
The
(2+10)
: 2 for the "length" of the string; 10, on average, for the actual string. (In some cases, the "2" is really "1" and if you are using utf8, each character could be multiple bytes.)If that table grows to 1 million rows, the index may take 50MB, a lot of it being unavoidable "overhead".
A major exception:
For InnoDB, the size of the
PRIMARY KEY
is virtually zero since it is "clustered" with the data. Actually, there is about 1% extra for the non-leaf nodes in that BTree and some 'overhead'.