I am new to AWS EMR and have created a Hive-Hbase table using the following code:
CREATE EXTERNAL TABLE IF NOT EXISTS airflow.card_transactions(card_id bigint,member_id bigint,amount float,postcode int,pos_id bigint,transaction_dt timestamp,status string) row format delimited fields terminated by ',' stored as textfile location '/user/hadoop/projectFD_pipeline/card_transactions'"
CREATE TABLE IF NOT EXISTS airflow.card_transactions_bucketed(cardid_txnts string,card_id bigint,member_id bigint,amount float,postcode int,pos_id bigint,transaction_dt timestamp,status string) clustered by (card_id) into 8 buckets STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with SERDEPROPERTIES ('hbase.columns.mapping'=':key,trans_data:card_id,trans_data:member_id,trans_data:amount,trans_data:postcode,trans_data:pos_id,trans_data:transaction_dt,trans_data:status') TBLPROPERTIES('hbase.table.name'='card_transactions')"
When i tried to insert values into this table:
INSERT OVERWRITE TABLE airflow.card_transactions_bucketed select concat_ws('~',cast(card_id as string),cast(transaction_dt as string)) as cardid_txnts,card_id,member_id,amount,postcode,pos_id,transaction_dt,status from airflow.card_transactions
it started failing with this error:
ERROR [25bd1caa-ccc6-4773-a13a-55082909aa47 main([])]: exec.Task (TezTask.java:execute(231)) - Failed to execute tez graph. org.apache.hadoop.hbase.TableNotFoundException: Can't write, table does not exist:card_transactions at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.checkOutputSpecs(TableOutputFormat.java:185) ~[hbase-server-1.4.13.jar:1.4.13] at org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat.checkOutputSpecs(HiveHBaseTableOutputFormat.java:86) ~[hive-hbase-handler-2.3.9-amzn-2.jar:2.3.9-amzn-2] at org.apache.hadoop.hive.ql.io.HivePassThroughOutputFormat.checkOutputSpecs(HivePassThroughOutputFormat.java:46) ~[hive-exec-2.3.9-amzn-2.jar:2.3.9-amzn-2]
The table 'airflow.card_transactions_bucketed' is created and available in Hive but HBase table ''hbase.table.name'='card_transactions'' is not. I don't see any errors in hive.log.
I am expecting the Hbase table to be created as well.
So it looks like unlike in Cloudera, in AWS the Hbase needs to be created manually. The above query does not create the Hbase table but integrates an already created Hbase table in the cluster. I was able to insert data through the integrated Hive table and the data showed up when queried in Hbase.