MySQL query to get all (additional) symptoms and diseases

1.6k views Asked by At

I have a database which contains main symptoms, additional symptoms and diseases. I need to replace 2 queries because I am sure my first query is not efficient and my 2nd is not correct at all. I hope anyone can help me because I am new to this area..

Database explanation:

The database is being used by a medical app:

  1. The user selects a specific bodypart
  2. The app lists all main symptoms of that specific bodypart
  3. The user selects a main symptom (common or less common)
  4. The app lists all diseases of the selected main symptom. There also appear 2 checkboxes (additional symptoms) that can be checked by the user. The order of the listed diseases (d_weight) depends on age, gender, selected main symptom and what boxes the user has checked. A disease with d_weight <= 5 is considered to be a common disease. A disease with d_weight > 5 is considered to be less-common. The possibilities of user input (age, gender, bodypart, main symptom) is stored in the symptom_disease_combi table.

    • asa_id is the id of all symptom that apply (addtional symptoms that are checked by the user)
    • asc_id is the id of all possibilities of additional symptoms that belong to the specific main symptom. For example, asc_id = 0, in case there is no additional symptom selected. asc_id = 1, in case only additional symptom 'Insomnia' is selected. asc_id = 2, in case both 'Insomnia' and 'Blowing up' are selected.

1. Query to get all symptoms of a specific bodypart (can be improved):

SELECT DISTINCT s.name
              , s.id
              , sdc.s_common 
           FROM symptom as s
              , symptom_disease_combi as sdc 
          WHERE sdc.age = ".$age." 
            AND sdc.gender = ".$gender." 
            AND sdc.bodypart = ".$bodypart." 
            AND sdc.s_id = s.id

2. Query to get all diseases and additional symptoms of selected symptom (doesn't work):

SELECT DISTINCT d.name
              , d.id
              , sdc.d_weight
              , adls.id
              , adls.name 
           FROM disease as d 
              , symptom_disease_combi as sdc
              , symptom as s
              , adlsymptom as adls 
          WHERE sdc.age = ".$age." 
            AND sdc.gender = ".$gender." 
            AND sdc.bodypart = ".$bodypart." 
            AND s.id = ".$sid." 
            AND sdc.s_id = s.id

3. Database structure (please let me know if my design can be improved)

CREATE TABLE symptom
(id INT NOT NULL AUTO INCREMENT
,name VARCHAR(100) DEFAULT NULL
,critical INT NOT NULL
,PRIMARY KEY (id)
) ENGINE=MyISAM;

id name                    critical
 1 Behavioral disturbances        1
 2 Ear pressure                   0
 3 Elevated temperature           0
 4 Fainting                       0
 5 Head pain                      0

CREATE TABLE disease (
id int(11) NOT NULL AUTO_INCREMENT,
name varchar(100) DEFAULT NULL,
critical int(11) NOT NULL,
description text NOT NULL,
tests text NOT NULL,
treatment text NOT NULL,
PRIMARY KEY (id)
) ENGINE=MyISAM AUTO_INCREMENT=19 DEFAULT CHARSET=latin1

id name                critical    description    tests       treatment
 1 Adjustment disorder 0           lorem ipsum    lorem ipsum lorem ipsum
 2 ADHD                0           lorem ipsum    lorem ipsum lorem ipsum
 3 Drug reaction       0           lorem ipsum    lorem ipsum lorem ipsum
 4 Seizure (epilepsy)  1           lorem ipsum    lorem ipsum lorem ipsum        

CREATE TABLE adlsymptom (
id int(11) NOT NULL AUTO_INCREMENT,
name varchar(100) NOT NULL,
PRIMARY KEY (id)
) ENGINE=MyISAM AUTO_INCREMENT=5 DEFAULT CHARSET=latin1

id name
 1 Insomnia
 2 Blowing up
 3 Depressed
 4 Drug abuse

CREATE TABLE adlsymptom_apply (
id int(11) NOT NULL,
as_id int(11) NOT NULL,
PRIMARY KEY (id,as_id),
KEY FK_additional_symptom_that_apply (as_id)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

id as_id
 1 1
 1 2

CREATE TABLE adlsymptom_combi (
id int(11) NOT NULL,
as_id int(11) NOT NULL,
PRIMARY KEY (id,as_id),
KEY FK_additional_symptom_combination (as_id)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

id as_id
 1 1
 2 2
 3 1
 3 2

CREATE TABLE symptom_disease_combi (
id int(11) NOT NULL AUTO_INCREMENT,
age int(11) NOT NULL,
gender int(11) NOT NULL,
bodypart int(11) NOT NULL,
s_id int(11) NOT NULL,
s_common int(11) NOT NULL,
asc_id int(11) NOT NULL,
asa_id int(11) NOT NULL,
d_id int(11) NOT NULL,
d_weight int(11) NOT NULL,
PRIMARY KEY (id),
KEY FK_symptom (s_id),
KEY FK_additional_symptom_combination (asc_id),
KEY FK_additional_symptom_that_apply (asa_id),
KEY FK_disease (d_id)
) ENGINE=MyISAM AUTO_INCREMENT=65 DEFAULT CHARSET=latin1

id age gender bodypart s_id s_common asc_id asa_id d_id d_weight
 1 1   1      1        1    1        0      1      1    1
 2 1   1      1        1    1        0      1      2    2  
 3 1   1      1        1    1        0      1      3    3
 4 1   1      1        1    1        0      1      11   4

UPDATE 1:

  • critical in disease and symptom is created to tell the user that they need to go to the hospital immediately, when they click on the disease or symptom
  • age, gender, bodypart are id's, so age = 1, means 0-5, age = 2 means 6-17, age = 3 means 18-59 and age = 4 means 60+.
  • Please take a look at the design of the application, it will help a lot to understand the design of the database: https://i.stack.imgur.com/ley6T.png Btw, in the design; cause == disease...
  • asa_id refers to id of adlsymptom_apply
  • asc_id referst to id of adlsymptom_combi
  • The 'distinct' was used to get all symptoms/diseases just 1 time. I am sure it's not needed, but I don't know how to fix it.
1

There are 1 answers

1
AsConfused On BEST ANSWER

ditch 2 symptom tables, go with one (symptom) and 1 intersect table (sdc)

i would not add a new column to symptom, such as status/level trying to jack up a symptom in importance to main or secondary, despite the temptation, because this could easily make it not flexible.

for instance, fainting may seem main for once disease/condition but it could skew it in general

go for generality, therefore 1 table for symptoms, you correctly have a d_weight in sdc

i like your sdc.d_weight concept.

for instance, fainting might have some weight with epilepsy, but not for the flu. the whole concept is screwed up when someone with the flu is prescribed Zarontin and not Tamiflu

since i like your sdc.d_weight concept, i wonder why you went with the additional symptoms table

in table sdc, you have keys/indexes with names starting with "FK_". Hopefully you have actual FK constrainst

and not just naming conventions that make you think you have them (FK's, you don't have them)

for instance, real FK's for bodypart, symptom, and disease/condition.

as the user selects symptoms, remove them from the GUI's ability to add that symptom again for search

this cuts down work and simplifies querying by ditching the secondary table (line 1 of what i wrote and suggest)

note, again, that your use of KEY (which is a synonym for index, not foreign key constraint), just creates an index ...

create table symptom
(
    id int not null auto_increment primary key,
    name varchar(100) not null, -- if you can't put a name to it, don't have it
    critical int not null
)ENGINE=MyISAM default charset=latin1;

create table disease
(
    id int not null auto_increment primary key, -- don't mix and match int with int(11)
    name varchar(100) not null, -- if you can't put a name to it, don't have it
    critical int not null,
    -- etc columns, text
)ENGINE=MyISAM default charset=latin1;

.

create table symptom_disease_combi
(   -- a.k.a. sdc
    id int not null auto_increment primary key, -- don't mix and match int with int(11)
    age int not null,
    gender char(1) not null,    -- int(11) is overkill
    bodypart int not null,
    s_id int not null,
    s_common int not null,
    asc_id int not null,
    asa_id int not null,
    d_id int not null,
    d_weight int not null,

    -- additional indexes (note pk above, so it is there and happy)

    -- note that some of your indexes (your KEYS) are worthless for the queries in question
    -- however they may be useful for other queries in your system
    -- for instance your asc and asa indexes will not be useful as they won't be picked up
    -- in relation to the question posed
    --
    -- you will need to find the proper balance of index bloat based on system needs

    INDEX idx_sdc_siddid (s_id,d_id,bodypart),  -- composite, general purpose
    INDEX idx_sdc_didsid (d_id,s_id,bodypart),  -- ditto but reverse
    INDEX idx_sdc_ascid (asc_id),
    INDEX idx_sdc_asaid (asa_id),
    -- put constraints here:
    CONSTRAINT `fk_sdc_bodypart` FOREIGN KEY (bodypart) REFERENCES bodypart(id),
    CONSTRAINT `fk_sdc_sid` FOREIGN KEY (s_id) REFERENCES symptom(id), -- don't mix and match int and int(11)
    CONSTRAINT `fk_sdc_did` FOREIGN KEY (d_id) REFERENCES disease(id)
    -- are there others, such as asc asa tables ??
)ENGINE=MyISAM default charset=latin1;

i think you need to think long and hard about the age column is sdc going to be loaded with a new row, one for each age, 50 to 120, for gender='M', for prostrate cancer? i am basing this on your age=xxxxx line, maybe you mean >=. or for leukemia, <= ... i think your schema needs a bit of thought

so i am excluding age from the below query

-- for the next query, i wonder why you need a distinct? -- is there more than 1 sdc.s_common for given symptom row ? only you would know -- if not, ditch the distinct

-- the order of your join on clause matters, make it follow a good index so results come back fast

-- i am not suggesting these are covered indexes, but you will minimize table scan activity

select distinct s.name, s.id, sdc.s_common
from symptom s
join symptom_disease_combi sdc
on sdc.s_id=s.id and sdc.bodypart=".$bodypart." and sdc.gender= ".$gender."

-- not my comment above about age -- incorporate weights as needed in a computed column

those are my suggestions, and good luck