Saving array data in normalized table

542 views Asked by At

I am updating my user management system with custom user settings feature. In other words, I want to save custom user settings/options in database. Initially I considered serializing the data and saving it to a TEXT field. But this has the disadvantage of not being searchable by configuration value.

Hence I have decided to normalize the data and store in a different table. Since the configuration data is multi dimensional, I had to flatten the data. For example,

Array (
    [user] => Array (
            [is_developer] => 1
            [api_access] => 1
        )

    [api] => Array (
            [scopes] => Array (
                    [0] => astro
                    [1] => user_info
                )

            [default_scope] => user_info
        )

    [astro] => Array (
            [location_id] => 12345
            [timezone] => America/New_York
            [coordinates] => Array (
                    [latitude] => 12.22
                    [longitude] => 24.44
                )

        )

)

is converted to

+---------+-----------------------------+------------------+
| user_id | config                      | value            |
+---------+-----------------------------+------------------+
|       1 | user.is_developer           | 1                |
|       1 | user.api_access             | 1                |
|       1 | api.scopes                  | astro            |
|       1 | api.scopes                  | user_info        |
|       1 | api.default_scope           | user_info        |
|       1 | astro.location_id           | 12345            |
|       1 | astro.timezone              | America/New_York |
|       1 | astro.coordinates.latitude  | 12.22            |
|       1 | astro.coordinates.longitude | 24.44            |
+---------+-----------------------------+------------------+

Is there a better way for storing this data?

Advantages:

  • Ability to search through custom options, like list all users who are developers and have api access.
  • Ability to load just a single option group.
    SELECT * FROM user_config WHERE config LIKE 'astro.%' AND user_id = 1;

Disadvantages:

  • Cannot specify a unique key, since numeric arrays have save config key. If I append the numeric index to the key (api.scopes.0, api.scopes.1 etc), then advantage #1 is partially lost.
  • Since unique key is not present, I am not able to use the better performing INSERT ... ON DUPLICATE UPDATE query. Instead, when the data needs to be updated, I have to delete the existing values and re-insert the data within a transaction.

Update:

After reading unique2's answer, I got another idea. Further normalizing the data might solve the unique key issue. i.e., save the keys in one table and the data in another. Need to check if INSERT ... ON DUPLICATE UPDATE will work with multiple tables. Insert does not work with multiple tables, so I guess that makes further normalization irrelevant.

mysql> SELECT * FROM user_config;
+-----------+---------+-----------------------------+
| config_id | user_id | config                      |
+-----------+---------+-----------------------------+
|         1 |       1 | api.default_scope           |
|         2 |       1 | api.scopes                  |
|         3 |       1 | astro.coordinates.latitude  |
|         4 |       1 | astro.coordinates.longitude |
|         5 |       1 | astro.location_id           |
|         6 |       1 | astro.timezone              |
|         7 |       1 | user.api_access             |
|         8 |       1 | user.is_developer           |
+-----------+---------+-----------------------------+
8 rows in set (0.00 sec)

mysql> SELECT * FROM user_config_value;
+-----------+------------------+
| config_id | value            |
+-----------+------------------+
|         1 | user_info        |
|         2 | astro            |
|         2 | user_info        |
|         3 | 12.22            |
|         4 | 24.44            |
|         5 | 12345            |
|         6 | America/New_York |
|         7 | 1                |
|         8 | 1                |
+-----------+------------------+
9 rows in set (0.00 sec)

Note:

  • Storing the data in a NOSQL database is not an option.
  • Actually, this is more of an academic question, because the actual custom data stored is very small and that too for very few users. Hence storing it as a serialized array or loading the entire array is not going to be a huge overhead, in my particular case.
2

There are 2 answers

2
unique2 On

Update (see previous version below):

With both your solution and my previous version there is actually the problem that not all information from your array structure is saved. You can still avoid api.scopes.1 by putting the last element of the option name in a separate field. If you combine this with soft deletion, you can use INSERT ... ON DUBLICATE UPDATE.

+----------+-------------------+---------------+------------------+---------+
| user_id* | config_group*     | config*       | value            | deleted |
+----------+-------------------+---------------+------------------+---------+
|        1 | user              | is_developer  | 1                |       0 |
|        1 | user              | api_access    | 1                |       0 |
|        1 | api.scopes        | 1             | astro            |       0 |
|        1 | api.scopes        | 2             | user_info        |       0 |
|        1 | api               | default_scope | user_info        |       0 |
|        1 | astro             | location_id   | 12345            |       0 |
|        1 | astro             | timezone      | America/New_York |       0 |
|        1 | astro.coordinates | latitude      | 12.22            |       0 |
|        1 | astro.coordinates | longitude     | 24.44            |       0 |
+----------+-----------------------------------+------------------+---------+
* marks key columns

Previous version:

If you split your data in two tables you can use a unique key for each of them.

The first table contains all config options which take on a single value (* marks key columns):

+----------+-----------------------------+------------------+
| user_id* | config*                     | value            |
+----------+-----------------------------+------------------+
|        1 | user.is_developer           | 1                |
|        1 | user.api_access             | 1                |
|        1 | api.default_scope           | user_info        |
|        1 | astro.location_id           | 12345            |
|        1 | astro.timezone              | America/New_York |
|        1 | astro.coordinates.latitude  | 12.22            |
|        1 | astro.coordinates.longitude | 24.44            |
+----------+-----------------------------+------------------+

The second table contains all options which consist of a set of values (with no dublicates; * marks key columns):

+----------+-----------------------------+------------------+
| user_id* | config*                     | value*           |
+----------+-----------------------------+------------------+
|        1 | api.scopes                  | astro            |
|        1 | api.scopes                  | user_info        |
+----------+-----------------------------+------------------+

Thus you can use your database to ensure data integrity. INSERT ... ON DUBLICATE UPDATE works naturally with the first table. You can also use it with your second table if you are willing to use soft delete:

+----------+-----------------------------+------------------+---------+
| user_id* | config*                     | value*           | deleted |
+----------+-----------------------------+------------------+---------+
|        1 | api.scopes                  | astro            |       0 |
|        1 | api.scopes                  | user_info        |       0 |
|        1 | api.scopes                  | old              |       1 |
+----------+-----------------------------+------------------+---------+
0
Bill Karwin On

You have reinvented a design called Entity-Attribute-Value. This is not what normalization means. Neither of the proposed ideas in your question, nor the proposed solution from @unique2's answer, fit any definition of a normal form -- they're not even relations.

You're correct that unique constraints don't work. Neither do foreign key constraints, or even NOT NULL. For example, how would you make the database enforce a mandatory value for user.api_access?

For more discussion on why EAV is a non-relational design, see my blog EAV FAIL, or many of my answers on StackOverflow for questions.

For alternative solutions to support user-defined attributes, see my presentation Extensible Data Modeling. I gave an overview of the pros and cons of:

  • Extra Columns
  • Entity-Attribute-Value
  • Class Table Inheritance
  • Serialized LOB
  • Inverted Indexes