I was reading about in-memory database which incorporates a feature like data compression. Using that, instead of storing first name, last name, father's name etc. values as it is in the column (which leads to a lot of data duplication and waste of disk storage), it creates a dictionary and attribute vector table for each column, so that only unique values are stored in dictionary and its corresponding attribute vector is stored in original table.
Clear advantage of this approach is that it a lot of space by removing overhead of data duplication.
I want to know:
- Does RDBMS like Oracle, MySQL etc. implicitly follow this approach when they store the data on disk? Or when we use these RDBMS we have to implement the same if we want to take advantage of the same?
- As we know there is no free lunch, so I would like to understand what are the trade-offs if developer implements above explained data compression approach? One I can think of is that in order to fetch the data from database, I will have to make a join between my dictionary table and main table. Isn't it?
Please share your thoughts and inputs.
This answer is based on my understanding of your query. It appears that you are mixing up two concepts : data normalisation and data storage optimisation.
Data Normalisation : This is a process that needs to be undertaken by the application developer. Here pieces of data that would need to be stored repeatedly are stored only once and are referenced using their identifiers which would typically be integers. This way the database consumes spaces only as much as needed to store the repeating data once. This is a common practice while storing string and variable length data into the database tables. In order to retrieve data, the application would have to perform joins between the related tables. And this process contributes directly to application performance depending on the manner in which the related tables are designed.
Data storage optimisation : This is what is handled by the RDBMS itself. This involves various steps like maintaining the B-Tree structures to hold data, compressing data before storage, managing the free space within the data files etc. Different RDBMS systems would handle them in different ways (some of them patented and proprietary while others are more general); however when we are speaking of RDBMS like Oracle and MySQL you can be assured that they would follow the best in class storage algorithms to efficiently store this data.
Hope this helps.