How to handle large amounts of data in MySQL database?

6.4k views Asked by At

Background

I have spent couple of days trying to figure out how I should handle large amounts of data in MySQL. I have selected some programs and techniques for the new server for the software. I am probably going to use Ubuntu 14.04LTS running nginx, Percona Server and will be using TokuDB for the 3 tables I have planned and InnoDB for the rest of the tables.

But yet I have the major problem unresolved. How to handle the huge amount of data in database?

Data

My estimates for the possible data to receive is 500 million rows a year. I will be receiving measurement data from sensors every 4 minutes.

Requirements

Insertion speed is not very critical, but I want to be able to select few hundred measurements in 1-2 seconds. Also the amount of required resources is a key factor.

Current plan

Now I have thought of splitting the sensor data in 3 tables.

EDIT: On every table:

id = PK, AI

sensor_id will be indexed

CREATE TABLE measurements_minute(
  id bigint(20),
  value float,
  sensor_id mediumint(8),
  created timestamp
) ENGINE=TokuDB;

CREATE TABLE measurements_hour(
  id bigint(20),
  value float,
  sensor_id mediumint(8),
  created timestamp
) ENGINE=TokuDB;

CREATE TABLE measurements_day(
  id bigint(20),
  value float,
  sensor_id mediumint(8),
  created timestamp
) ENGINE=TokuDB;

So I would be storing this 4 minute data for one month. After the data is 1 month old it would be deleted from minute table. Then average value would be calculated from the minute values and inserted into the measurements_hour table. Then again when the data is 1 year old all the hour data would be deleted and daily averages would be stored in measurements_day table.

Questions

Is this considered a good way of doing this? Is there something else to take in consideration? How about table partitioning, should I do that? How should I execute the splitting of the date into different tables? Triggers and procedures?

EDIT: My ideas

Any idea if MonetDB or Infobright would be any good for this?

2

There are 2 answers

1
fmgonzalez On

I had to solve that type of ploblem before, with nearly a Million rows per hour.

Some tips:

Engine Mysam. You don't need to update or manage transactions with that tables. You are going to insert, select the values, and eventualy delete it.

Be careful with the indexes. In my case, It was critical the insertion and sometimes Mysql queue was full of pending inserts. A insert spend more time if your table has more index. The indexes depends of your calculated values and when you are going to do it.

Sharding your buffer tables. I only trigger the calculated values when the table was ready. When I was calculating my a values in buffer_a table, it's because the insertions was on buffer_b one. In my case, I calculate the values every day, so I switch the destination table every day. In fact, I dumped all the data and exported it in another database to make the avg, and other process without disturb the inserts.

I hope you find this helpful.

2
tmcallaghan On

I have a few suggestions, and further questions.

  1. You have not defined a primary key on your tables, so MySQL will create one automatically. Assuming that you meant for "id" to be your primary key, you need to change the line in all your table create statements to be something like "id bigint(20) NOT NULL AUTO_INCREMENT PRIMARY KEY,".

  2. You haven't defined any indexes on the tables, how do you plan on querying? Without indexes, all queries will be full table scans and likely very slow.

  3. Lastly, for this use-case, I'd partition the tables to make the removal of old data quick and easy.