Use case HBase on EMR

Question

Use case HBase on EMR

1.5k views Asked by GermainGum At 12 June 2015 at 10:39

I read the documentation on AWS, but a point is still unclear.

Is S3 the primary storage of EMR cluster? or does the data are in EC2 and S3 is just a copy?

In the doc :

"HBase on Amazon EMR provides the ability to back up your HBase data directly to Amazon Simple Storage Service (Amazon S3)"
"Hadoop clusters running on Amazon EMR use EC2 instances as virtual Linux servers for the master and slave nodes, Amazon S3 for bulk storage of input..."
"provides the ability to launch a new cluster and populate it with data from a previous HBase backup"

My use case : Use HBASE to store TB of data. Update my tables only three or two times a month by starting an emr cluster. Tables store on S3.

Original Q&A

There are 2 answers

Sergei Rodionov On 13 August 2017 at 19:55

As of EMR 5.2.0 you can run HBase 1.3.0 and higher directly on AWS S3.

The setting replaces the hfds:// protocol in the hbase-site.xml file:

"hbase.rootdir": "s3://my-bucket/hbase"

No changes to HBase clients are required. The configuration simplifies operations by eliminating the need to manage HDFS NameNode and DataNodes.

**ChristopherB** · Accepted Answer · 2015-06-14T11:28:05+00:00

The key question in your use case is how the data should be available between updates.

If your goal is to have data accessible through a Hbase interface all the time then a Hbase cluster (like on EMR) would need to be up and running continually. Hbase currently only supports HDFS as live storage for Hfiles. S3 storage is external to the cluster and thus can be used as a destination for backups or other ingress/egress of data.

TechQA.

Use case HBase on EMR

There are 2 answers

Related Questions in HADOOP

Related Questions in AMAZON-WEB-SERVICES

Related Questions in HBASE

Related Questions in STORAGE

Related Questions in EMR

Popular Questions

Popular Tags

Trending Questions