I currently have a Users table and now want to add other user-related information for a particular user. The form that accepts this information has fields like languages and OS each with a list of options with checkboxes.
For Example:
Languages known: checkbox PHP, Java, Ruby
OS knowledge: Windows, Linux, Mac
Currently my database tables looks like this:
USER
----------------------------------------
| ID | Name |
-----------------------
| 1 | John |
-----------------------
| 2 | Alice |
-----------------------
LANGUAGES
----------------------------------------
| ID | User_ID(FK) | lang-name |
----------------------------------------
| 1 | 1 | PHP |
----------------------------------------
| 1 | 2 | Java |
----------------------------------------
OS
----------------------------------------
| ID | User_ID(FK) | os-name |
----------------------------------------
| 1 | 1 | Windows |
----------------------------------------
| 1 | 2 | Windows |
----------------------------------------
Does this seem like a good schema? There are many more such user-related fields that will each need to have their own table and there seems to be a lot of redundancy within a table since thousands of users will know PHP and hence there will be thousands of rows with PHP as the language for each of the different users.
Is there a better way to organize the schema?
Perhaps you could make
Language
andOS
first-class entities in the database with their own tables, then use a joining table for the many-to-many relationship withUser
. Something like this:That way the actual entities (
User
,Language
,OS
) are self-contained with only the data that's meaningful to them, not polluted or duplicated with the concerns of their relationships with each other. And the relationships are contained within their own simple numeric-only tables, which themselves aren't entities but are just many-to-many links between entities.No data is duplicated (in your sample data,
Language
andOS
would each have only three records, at least for now), and it would be a lot friendlier to ORMs and other frameworks if you ever need to use one.Edit: Based on your comment, you might try something like this:
This gives you a lot of flexibility. In your sample data,
Language
andOS
would be records inLookupType
. All of the languages and OSes would be values inLookup
which link back to their respectiveLookupType
. So still no repeating of data. And theUserLookup
table is the only many-to-many link table.Be careful with this design, though. It is flexible, definitely. But when you use this table structure as your actual domain models you run into situations where terms like "Lookup" become business terms, and that's probably not the case. "Language" and "OS" are the actual models. I would recommend using Views or perhaps Stored Procedures to abstract this structure from the code. So the code would pull Languages from the Language view or procedure, not directly from the
Lookup
table.