Database schema for multiple checkboxes

2.1k views Asked by At

I currently have a Users table and now want to add other user-related information for a particular user. The form that accepts this information has fields like languages and OS each with a list of options with checkboxes.

For Example:

Languages known: checkbox PHP, Java, Ruby

OS knowledge: Windows, Linux, Mac

Currently my database tables looks like this:

USER
----------------------------------------
|  ID  |  Name      |  
-----------------------
|   1  |    John    |    
-----------------------
|   2  |    Alice    |    
-----------------------

LANGUAGES
----------------------------------------
|  ID  | User_ID(FK)     | lang-name  | 
----------------------------------------
|   1  |    1            |   PHP      |
----------------------------------------
|   1  |    2            |   Java     |
----------------------------------------

OS
----------------------------------------
|  ID  | User_ID(FK)     | os-name  | 
----------------------------------------
|   1  |    1            | Windows  |
----------------------------------------
|   1  |    2            |  Windows |
----------------------------------------

Does this seem like a good schema? There are many more such user-related fields that will each need to have their own table and there seems to be a lot of redundancy within a table since thousands of users will know PHP and hence there will be thousands of rows with PHP as the language for each of the different users.

Is there a better way to organize the schema?

1

There are 1 answers

4
David On

Perhaps you could make Language and OS first-class entities in the database with their own tables, then use a joining table for the many-to-many relationship with User. Something like this:

User
---------
ID
Name
etc...

Language
---------
ID
Name

OS
---------
ID
Name

UserLanguage
---------
UserID
LanguageID

UserOS
---------
UserID
OSID

That way the actual entities (User, Language, OS) are self-contained with only the data that's meaningful to them, not polluted or duplicated with the concerns of their relationships with each other. And the relationships are contained within their own simple numeric-only tables, which themselves aren't entities but are just many-to-many links between entities.

No data is duplicated (in your sample data, Language and OS would each have only three records, at least for now), and it would be a lot friendlier to ORMs and other frameworks if you ever need to use one.

Edit: Based on your comment, you might try something like this:

User
---------
ID
Name
etc...

Lookup
---------
ID
LookupTypeID
Value

LookupType
---------
ID
Value

UserLookup
---------
UserID
LookupID

This gives you a lot of flexibility. In your sample data, Language and OS would be records in LookupType. All of the languages and OSes would be values in Lookup which link back to their respective LookupType. So still no repeating of data. And the UserLookup table is the only many-to-many link table.

Be careful with this design, though. It is flexible, definitely. But when you use this table structure as your actual domain models you run into situations where terms like "Lookup" become business terms, and that's probably not the case. "Language" and "OS" are the actual models. I would recommend using Views or perhaps Stored Procedures to abstract this structure from the code. So the code would pull Languages from the Language view or procedure, not directly from the Lookup table.