Hive with Hadoop high availability

1.4k views Asked by At

I wanted to understand how hive knows which of the hadoop namenode is in active state and what happens when the active namenode fails

2

There are 2 answers

0
brandon.bell On

Hive is configured via metatool to point to the configured dfs.nameservices for HA HDFS. See https://cwiki.apache.org/confluence/display/Hive/Hive+MetaTool. dfs.nameservices is a logical address while the actual namenodes are configured with dfs.ha.namenodes.[id].

As for which Namenode is active, state is stored in Zookeeper. When the active namenode fails, failover is triggered after a configured time (5 second default, ha.zookeeper.session-timeout.ms). A fencing script is required and triggers the standby namenode to become active.

0
Renjith On

In hdfs HA environment name node url should be a logical name (eg hdfs://logicalnamenode). We need to configure hive to work with HA. For that you need to change the hive name node configuration with metatool command.

  1. List the current NN configuration
    ~# metatool -listFSRoot
    hdfs://namenode:8020/user/hive/warehouse
  2. The following command will update the old NN configuration with Logical name
    metatool -updateLocation hdfs://logicalnamenode hdfs://namenode:8020 -tablePropKey avro.schema.url