I'm working on a analysis assignment, we got a partial data-set from the university Library containing almost 300.000.000 rows.
Each row contains:
- ID
- Date
- Owner
- Deadline
- Checkout_date
- Checkin_date
I put all this inside a MySQL table, then I started querying that for my analysis assignment, however simple query (SELECT * FROM table WHERE ID = something
) where taking 9-10 minutes to complete. So I created an index for all the columns, which made it noticeable faster ~ 30 sec.
So I started reading similar issues, and people recommended switching to a "Wide column store" or "Search engine" instead of "Relational".
So my question is, what would be the best database engine to use for this data?
The answer is, of course, "it depends". In your example, you're counting the number of records in the database with a given ID. I find it hard to believe that it would take 30 seconds in MySQL, unless you're on some sluggish laptop.
MySQL has powered an incredible number of systems because it is full-featured, stable, and has pretty good performance. It's bad (or has been bad) at some things, like text search, clustering, etc.
Systems like Elasticsearch are good with globs of text, but still may not be a good fit for your system, depending on usage. From your schema, you have one text field ("owner"), and you wouldn't need Elasticsearch's text searching capabilities on a field like that (who ever needed to stem a user name?). Elasticsearch is also used widely for log files, which also don't need a text engine. It is, however, good with blocks of text and with with clustering.
If this is a class assignment, I'd stick with MySQL.