I have a huge InnoDB table (>1TB, >1B rows) that I would like to shard: I would like to make multiple smaller independent tables from that big table.
How to do that ?
What I've already tried:
- Moving rows to partitions by SELECTing them from the original table, and INSERTing them to the partitions. This takes ages, and keeping rows in sync during the operation is hard (but seems to be doable with triggers, as long as the partition is on the same server). I haven't found a ready to use tool to do that.
- Copying the entire table and then deleting the rows that do not belong to the partition. Still very slow, especially given the size of the table. This is what MySQL Fabric does, apparently.
Random crazy ideas:
- Splitting the
.idb
file offline, and importing it into the server, but I don't know if a tool capable of doing this exists.
SHOW CREATE TABLE:
CREATE TABLE `Huge` (
`account_id` int(11) NOT NULL,
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`foo` varchar(255) NOT NULL,
`bar` int(11) NOT NULL,
`baz` char(2) NOT NULL,
PRIMARY KEY (`account_id`,`id`),
UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED;
Sharding key would be account_id. The primary key currently is (account_id,id)
so that rows are clustered by account_id
.
Is there any tool automating this task ? Is there any better way of doing this ?
"Sharding" is splitting data (usually one table) across multiple servers. "Partitioning" is splitting a table into multiple subtables on the same server. Which are you doing? Fabric does sharding. Your comment about .ibd implied you are thinking about partitioning.
Assuming that you want to
PARTITION
one huge table, I need to first ask why. This is a serious question because most (I do mean most) people think that partitioning will magically produce some benefit, when in reality it won't. I believe there are only 4 use cases for partitioning. Does your situation fall into any of them?If, on the other hand, you want Sharding, then please provide
SHOW CREATE TABLE
and discuss which column you would shard on.Edit (after getting clarity on goals)
I hope you do not have explicit
FOREIGN KEYs
; they will not work with either partitioning or sharding.is a problem in two ways.
There is no need to make
id
UNIQUE
; the only requirement for anAUTO_INCREMENT
is that it be the first column in some index. So this would be less burden on the system:INDEX(id)
An
AUTO_INCREMENT
does work forPARTITIONed
table, but it does not work for a sharded table. You will need to assess the purpose ofid
. Either it is OK for it to be unique only within one shard, then there is no real problem. Ifid
needs to be unique across all shards, that is more difficult. Probably the only solution is to have a technique for getting id(s) from some central server. However, that leads to a single-point-of-failure and a bottleneck. If you need to go that direction, I can advise on how to avoid those drawbacks.How to migrate to the ultimate, sharded, system?
I do not recommend any digression into
PARTITIONing
, it won't really help in the long run. And the cost ofREORGANIZE PARTITION
is high -- copying all the rows over, both the extracted rows and the remaining rows. Edit: If you do use partitions, usept-online-schema-change
to do the splitting with minimal downtime.Instead, I recommend perfecting a tool that will migrate one
account_id
from one shard to another. This has to be custom code, because there are probably other tables (and references between tables) that will be affected by moving an account to a different server. In the long run, this will be useful for load balancing, hardware upgrades, software upgrades, and even schema changes. When you need to change something, create a new shard(s) with the new OS/version/schema/whatever and migrate users to it.The simple approach to this tool is
DELETE
rows on the old shardThis is not much of a problem if an account is "small". But if you need minimal downtime (writes blocked), then we can discuss a more complex approach.
(In case you have not guessed, I have 'been there, done that'.)