I am relatively a newbie to big data processing looking for some specific guidance from the SO community.
We are currently setup with a monolithic/sequential ETL, needless to say it is not scalable as our data grows. What are our options (sure distributing and parallelizing are but need specifics)? I have played with Hadoop and it may be appropriate to use here, but I am wondering what are some of the other options out there? May be something that's easier to transition to for a database developer?
Kind of related to question above is we also have an OLAP cube for aggregated data. Is Elasticsearch or Solr good candidates for replacing an OLAP cube? Has anyone successfully done this? What are the gotchas?
same kind of use case currently we are working on.
our approach may be use full.
step 1: we are sqooping data to Hdfs from dbs
step 2: ETL logic in Pig scripting
step 3: building index on aggregated table data to solr.
step 4: search on solr through web interface.
in our use case we are developing pig jobs to perform transformation logic storing them to final folders incrementally. later MR indexer tool will index the data to solr.we are using cloudera-search. let me know if any thing.