Database design to enable Multiple tags like Stackoverflow?

5.3k views Asked by At

I have the following tables.

Articles table
a_id INT primary unique
name VARCHAR
Description VARCHAR
c_id INT

Category table
id INT
cat_name VARCHAR

For now I simply use

SELECT a_id,name,Description,cat_name FROM Articles LEFT JOIN Category ON Articles.a_id=Category.id WHERE c_id={$id}

This gives me all articles which belong to a certain category along with category name.
Each article is having only one category.

AND I use a sub category in a similar way(I have another table named sub_cat).
But every article doesn't necessary have a sub category.It may belong to multiple categories instead.

I now think of tagging an article with more than one category just like the questions at stackoverflow are tagged(eg: with multiple tags like PHP,MYSQL,SQL etc).

AND later I have to display(filter) all article with certain tags(eg: tagged with php,php +MySQL) and I also have to display the tags along with the article name,Description.
Can anyone help me redesign the database?(I am using php + MySQL at back-end)

2

There are 2 answers

15
RBarryYoung On BEST ANSWER

Create a new table:

CREATE TABLE ArticleCategories(
    A_ID INT,
    C_ID INT,
    Constraint PK_ArticleCategories Primary Key (Article_ID, Category_ID)
)

(this is the SQL server syntax, may be slightly different for MySQL)

This is called a "Junction Table" or a "Mapping Table" and it is how you express Many-to-Many relationships in SQL. So, whenever you want to add a Category to an Article, just INSERT a row into this table with the IDs of the Article and the Category.

For instance, you can initialize it like this:

INSERT Into ArticleCategories(A_ID,C_ID)
    SELECT A_ID,C_ID From Articles

Now you can remove c_id from your Articles table.

To get back all of the Categories for a single Article, you would do use a query like this:

SELECT a_id,name,Description,cat_name 
FROM Articles 
LEFT JOIN  ArticleCategories ON Articles.a_id=ArticleCategories.a_id 
INNER JOIN Category ON ArticleCategories.c_id=Category.id 
WHERE Articles.a_id={$a_id}

Alternatively, to return all articles that have a category LIKE a certain string:

SELECT a_id,name,Description
FROM Articles 
WHERE EXISTS(   Select * 
                From ArticleCategories 
                INNER JOIN Category ON ArticleCategories.c_id=Category.id 
                WHERE Articles.a_id=ArticleCategories.a_id 
                  AND Category.cat_name LIKE '%'+{$match}+'%'
             )

(You may have to adjust the last line, as I am not sure how string parameters are passed MySQL+PHP.)

2
Raymond Nijland On

Ok RBarryYoung you asked me about an reference/analyse you get one

This reference / analyse is based off the documention / source code analyse off the MySQL server

INSERT Into ArticleCategories(A_ID,C_ID)
    SELECT A_ID,C_ID From Articles

On an large Articles table with many rows this copy will push one core off the CPU to 100% load and will create a disk based temporary table what will slow down the complete MySQL performance because the disk will be stress out with that copy. If this is a one time process this is not that bad but do the math if you run this every time..

SELECT a_id,name,Description
FROM Articles 
WHERE EXISTS(   Select * 
                From ArticleCategories 
                INNER JOIN Category ON ArticleCategories.c_id=Category.id 
                WHERE Articles.a_id=ArticleCategories.a_id 
                  AND Category.cat_name LIKE '%'+{$match}+'%'
             )

Note dont take the Execution Times on sqlfriddle for real its an busy server and the times vary alot to make a good statement but look to what View Execution Plan has to say

see http://sqlfiddle.com/#!2/48817/21 for demo

Both querys always trigger an complete table scan on table Articles and two DEPENDENT SUBQUERYS thats not good if you have an large Articles table with many records. This means the performance depends on the number of Articles rows even when you want only the articles that are in the category.

Select * 
                From ArticleCategories 
                INNER JOIN Category ON ArticleCategories.c_id=Category.id 
                WHERE Articles.a_id=ArticleCategories.a_id 
                  AND Category.cat_name LIKE '%'+{$match}+'%'

This query is the inner subquery but when you try to run it, MySQL cant run because it depends on a value of the Articles table so this is correlated subquery. a subquery type that will be evaluated once for each row processed by the outer query. not good indeed

There are more ways off rewriting RBarryYoung query i will show one. The INNER JOIN way is much more efficent even with the LIKE operator Note ive made an habbit out off it that i start with the table with the lowest number off records and work my way up if you start with the table Articles the executing will be the same if the MySQL optimizer chooses the right plan..

SELECT 
   Articles.a_id
 , Articles.name
 , Articles.description
FROM 
 Category

INNER JOIN
 ArticleCategories
ON
 Category.id = ArticleCategories.c_id

INNER JOIN
 Articles
ON 
 ArticleCategories.a_id = Articles.a_id

WHERE 
 cat_name LIKE '%php%';
;

see http://sqlfiddle.com/#!2/43451/23 for demo Note that this look worse because it looks like more rows needs to be checkt

Note if the Article table has low number off records RBarryYoung EXIST way and INNER JOIN way will perform more or less the same based on executing times and more proof the INNER JOIN way scales better when the record count become larger

http://sqlfiddle.com/#!2/c11f3/1 EXISTS oeps more Articles records needs to be checked now (even when they are not linked with the ArticleCategories table) so the query is less efficient now http://sqlfiddle.com/#!2/7aa74/8 INNER JOIN same explain plan as the first demo

Extra notes about scaling it becomes even more worse when you also want to ORDER BY or GROUP BY the NOT EXIST way has an bigger chance it will create an disk based temporary table that will kill MySQL performance

Lets also analyse the LIKE '%php%' vs = 'php' for the EXIST way and INNER JOIN way

the EXIST way

http://sqlfiddle.com/#!2/48817/21 / http://sqlfiddle.com/#!2/c11f3/1 (more Articles) the explain tells me both patterns are more or less the same but 'php' should be little faster because off the const type vs ref in the TYPE column but LIKE %php% will use more CPU because an string compare algoritme needs to run.

the INNER JOIN way

http://sqlfiddle.com/#!2/43451/23 / http://sqlfiddle.com/#!2/7aa74/8 (more Articles) the explain tell me the LIKE '%php%' should be slower because 3 more rows need to be analysed but not shocking slower in this case (you can see the index is not really used on the best way).

RBarryYoung way works but doenst keep performance atleast not on a MySQL server see http://sqlfiddle.com/#!2/b2bd9/1 or http://sqlfiddle.com/#!2/34ea7/1 for examples that will scale on large tables with lots of records this is what the topic starter needs