Data warehousing principles and NoSQL

1.1k views Asked by At

with MongoDB, CouchDB and related technologies we can get faster querying so is this still valid?

“A copy of transaction data, specially restructured for queries and analyses.” (R. Kimball The Data Warehouse Toolkit, 1996

I mean, do we really need to restructure our data to an OLAP scheme to query it for analysis purposes? More specifically can drill-down, slice and dice and other reporting for analysis purposes be achieved with NoSQL (NOT necessarily with OLAP modelling)? Also could we overcome the "data subset" querying limitation of OLAP and report on the whole data universe with NoSQL?

2

There are 2 answers

0
Gabe Rainbow On

In my estimation OLAP subsets or structures will not go away and may become more common for a few reasons. In no particular order: f) Map-reduce is all you get in many cases. Mongodb is on a steadier foot with their speedier aggregation-pipeline; u) A big gotcha with NoSQL is the lack of joins or relationships. Meaning that your underlying data has to be ugly in order to support many OLAP reports; b) Its worthwhile constructing 'throw away' or volatile data subsets simply to keep a clean master table/collection; a) NoSQL is perfectly suited for redundant datasets: there are no create table or even schemas needed, its dead simple to spin up and kill collections; r) NoSQL is heaps easier to scale for the additional dataset than SQL; d) A fledgling start-up can avoid the cost and resources needed to support two db technologies ( one for OLAP and one for OLTP ); and, b) you'll find your backend / frontend code much much much easier and manageable with massaged data sets; and, c) the unbeatable speed advantage of premade datasets with their own premade indices.

0
Siddharth Shukla On

Answer to both of your questions is YES. 1. It is still valid to restructure your transaction data for analysis. 2. You can use NoSQL to do everything you asked.

As you mentioned only about querying/analysis/OLAP, I am assuming the only consideration here is creating a querying/reporting platform. So, OLTP system and whether NoSQL can handle it or not is out of discussion.

It is difficult to answer this question without having a context associated to it. Context as in whether you are creating this platform for a team, department, vertical, business line etc. of an organization or you are creating this platform for whole organization as central repository.

If you are setting it up for a team/department, volume is not humongous, less number of users will query it, querying frequency is not that high then OLAP is still valid. But if volume is humongous and and with high frequency of querying and large number of users and you see that you will need to scale in future then NoSQL will be your bet.

Also, if you create a platform for NoSQL at an enterprise level. Say - you create a enterprise data warehouse or a data lake which caters any and every audience from the organization. But within organization teams/departments might create their own OLAP by creating Data Marts to cater their own needs. So, in this case both OLAP and NoSQL is still valid.

I would say it totally depends on your use case. To reach a decision there are various factors that need to be considered. Pros and cons are always there for any technology in consideration. There is no generic answer for these kind of comparisons. You need to answer questions like - What are your data sources and their format; if they are structured, semi-structured, unstructured? Who are your users and how many; if there are multiple departments with different needs, if they need their separate dashboard, whether they need to access each others data? What is the volume of the data you will be handling? What is the frequency of querying the reporting platform? And many more questions you can ask yourself. After answering these questions then decide what is the best suited option for you.