Background
I have spent couple of days trying to figure out how I should handle large amounts of data in MySQL. I have selected some programs and techniques for the new server for the software. I am probably going to use Ubuntu 14.04LTS running nginx, Percona Server and will be using TokuDB for the 3 tables I have planned and InnoDB for the rest of the tables.
But yet I have the major problem unresolved. How to handle the huge amount of data in database?
Data
My estimates for the possible data to receive is 500 million rows a year. I will be receiving measurement data from sensors every 4 minutes.
Requirements
Insertion speed is not very critical, but I want to be able to select few hundred measurements in 1-2 seconds. Also the amount of required resources is a key factor.
Current plan
Now I have thought of splitting the sensor data in 3 tables.
EDIT: On every table:
id = PK, AI
sensor_id will be indexed
CREATE TABLE measurements_minute(
id bigint(20),
value float,
sensor_id mediumint(8),
created timestamp
) ENGINE=TokuDB;
CREATE TABLE measurements_hour(
id bigint(20),
value float,
sensor_id mediumint(8),
created timestamp
) ENGINE=TokuDB;
CREATE TABLE measurements_day(
id bigint(20),
value float,
sensor_id mediumint(8),
created timestamp
) ENGINE=TokuDB;
So I would be storing this 4 minute data for one month. After the data is 1 month old it would be deleted from minute table. Then average value would be calculated from the minute values and inserted into the measurements_hour table. Then again when the data is 1 year old all the hour data would be deleted and daily averages would be stored in measurements_day table.
Questions
Is this considered a good way of doing this? Is there something else to take in consideration? How about table partitioning, should I do that? How should I execute the splitting of the date into different tables? Triggers and procedures?
EDIT: My ideas
Any idea if MonetDB or Infobright would be any good for this?
I had to solve that type of ploblem before, with nearly a Million rows per hour.
Some tips:
Engine Mysam. You don't need to update or manage transactions with that tables. You are going to insert, select the values, and eventualy delete it.
Be careful with the indexes. In my case, It was critical the insertion and sometimes Mysql queue was full of pending inserts. A insert spend more time if your table has more index. The indexes depends of your calculated values and when you are going to do it.
Sharding your buffer tables. I only trigger the calculated values when the table was ready. When I was calculating my
a
values inbuffer_a
table, it's because the insertions was onbuffer_b
one. In my case, I calculate the values every day, so I switch the destination table every day. In fact, I dumped all the data and exported it in another database to make the avg, and other process without disturb the inserts.I hope you find this helpful.