Should I normalize or not? If yes how?

312 views Asked by At

Currently I have a table with a column containing CSVs. I am not sure whether to normalize the whole table or not. The problem is this column, configuration, may contain up to 50 or more different types of values. For example in the table shown below it's 18, 20, but for other data in the same column it may be 0, 20, 21, 22, 23, 25, 26, 27, 40, 52, 54, 55 and so on, however these values are unique. They will never repeat.

I do not know what is the maximum number for it(it may vary) so that's why I kept it in CSV. I am currently have trouble normalizing it, or rather I am not sure whether I should even normalize it. Any help here?

id    tester_type    device_id      board_id        configuration
75946   UFLEX           997           220   
44570   UFLEX           450           220               18,20
44569   UFLEX           449           220               18,20
44568   UFLEX           448           220               18,20
44567   UFLEX           447           220               18

Note: Configuration column does also contain empty values or empty spaces.

2

There are 2 answers

3
Ja͢ck On BEST ANSWER

I do have to query against it so I guess I have to normalize it.

Yes, you do :)

If do create the table, does that mean I have to create for every possible configuration value?

An example of a normalised structure would be:

join table
==========
test_id configuration_id (spanning unique constraint)
------- ----------------
44570   18
44570   20
44569   18
44569   20
44569   20
44568   18
44568   20
44567   18

configurations table
====================
configuration_id
----------------
18
20

If you're using InnoDB, each column of the join table is also a foreign key to their respective parent tables.

0
Rick James On

I disagree with both "must" and "must not" normalize stands. My 2 cents:

  • Do not normalize "continuous" values such as prices, numbers, dates, floats, etc.
  • Do not normalize values that are unique or nearly so.
  • Do not normalize fields that are narrow. For example, don't replace a 2-letter country code with a 4-byte country_id.

  • "Normalize for simplicity": Do normalize things that are used in multiple tables and are subject to change. Sometimes names, addresses, company names, etc fall into this category. This is so you can change the value in exactly one place, not lots of places.

  • "Normalize for space": Do normalize things that would save a significant amount of overall space for the dataset. (This applies to gigabyte tables much more so than to kilobyte tables.)

  • Normalize, but don't "over-normalize". You will figure out what I mean when you have over-normalized and a nasty JOIN can't be optimized.

If you would like further specific advice, let's see SHOW CREATE TABLE and sample values for any un-obvious columns.