Web interface login Apache Hadoop Cluster with Kerberos

473 views Asked by At

I've a Docker stack with an Apache Hadoop (version 3.3.4) cluster, composed by one namenode and two datanodes, and a container with both Kerberos admin server and Kerberos kdc. I'm trying to configure Kerberos authentication on the Apache Hadoop cluster. The namenode and the datanodes connect correctly to the Kerberos container and to each other using the Kerberos prncipals. However, authentication to the web interface of the namenode doesn't work and I get the following error:

enter image description here

The details of my configuration are the following.

I have four container in my Docker stack:

  • The namenode with hostname "namenodehostname.host.com" and alias "namenode";
  • First datanode with hostname "datanode1hostname.host.com" and alias "datanode1";
  • Second datanode with hostname "datanode2hostname.host.com" and alias "datanode2";
  • Kerberos server and kdc with hostname "krb5.host.com" and alias "kerberos".

The namenode and the datanodes starts with a custom user "hadoop" created in the Dockerfile.

All four containers have the following /etc/hosts file:

namenode  namenodehostname.host.com
datanode1 datanode1hostname.host.com
datanode2 datanode2hostname.host.com
kerberos krb5.host.com

The file krb5.conf (in the namenode, in the datanodes and in the Kerberos container) is:

[libdefaults]
    default_realm = TESTREALM 

# The following krb5.conf variables are only for MIT Kerberos.
    kdc_timesync = 1
    ccache_type = 4
    forwardable = true
    proxiable = true # The following encryption type specification will be used by MIT Kerberos
# if uncommented.  In general, the defaults in the MIT Kerberos code are
# correct and overriding these specifications only serves to disable new
# encryption types as they are added, creating interoperability problems.
#
# The only time when you might need to uncomment these lines and change
# the enctypes is if you have local software that will break on ticket
# caches containing ticket encryption types it doesn't know about (such as
# old versions of Sun Java). #       default_tgs_enctypes = des3-hmac-sha1
#       default_tkt_enctypes = des3-hmac-sha1
#       permitted_enctypes = des3-hmac-sha1 # The following libdefaults parameters are only for Heimdal Kerberos. fcc-mit-ticketflags = true 
[realms]
    TESTREALM = {
        kdc = krb5.host.com
        admin_server = krb5.host.com
    } 
[domain_realm]
    .host.com = TESTREALM
    host.com = TESTREALM

The file kdc.conf (in the Kerberos container) is:

[kdcdefaults]
    kdc_ports = 750,88 

[realms]
    TESTREALM = {
        database_name = /etc/krb5kdc/data/database/principal
        admin_keytab = FILE:/etc/krb5kdc/data/keytabs/kadm5.keytab
        acl_file = /etc/krb5kdc/kadm5.acl
        key_stash_file = /etc/krb5kdc/data/stashfile/stash
        kdc_ports = 750,88
        max_life = 10h 0m 0s
        max_renewable_life = 7d 0h 0m 0s
        master_key_type = des3-hmac-sha1
        #supported_enctypes = aes256-cts:normal aes128-cts:normal
        default_principal_flags = +preauth
    }

I created five principals in Kerberos container:

  • root/admin
  • nn/namenodehostname.host.com: used by the namenode
  • HTTP/namenodehostname.host.com: used by the namenode
  • dn/datanode1hostname.host.com: used by first datanode
  • dn/datanode2hostname.host.com: used by second datanode

All this principals except for the root/admin, are mapped on the user "hadoop" in the namenode and in the datanodes (see property hadoop.security.auth_to_local in the del core-site.xml file). I also created a keytab file for each principal ending in *.host.com.

The namenode is configured with the following files:

core-site.xml

<configuration>
<property><name>fs.defaultFS</name><value>hdfs://namenodehostname.host.com:9000</value></property>
<property><name>hadoop.security.authentication</name><value>kerberos</value></property>
<property><name>hadoop.security.authorization</name><value>true</value></property>
<property><name>hadoop.rpc.protection</name><value>authentication</value></property>
<property><name>hadoop.security.auth_to_local</name><value>
RULE:[1:$1](nn/namenodehostname.host.com@TESTREALM)s/^.*$/hadoop/
RULE:[1:$1](dn/datanode1hostname.host.com@TESTREALM)s/^.*$/hadoop/
RULE:[1:$1](dn/datanode2hostname.host.com@TESTREALM)s/^.*$/hadoop/
RULE:[1:$1](http/namenodehostname.host.com@TESTREALM)s/^.*$/hadoop/
DEFAULT</value></property>
<property><name>hadoop.http.filter.initializers</name><value>org.apache.hadoop.security.AuthenticationFilterInitializer</value></property>
<property><name>hadoop.http.authentication.token.validity</name><value>3600</value></property>
<property><name>hadoop.http.authentication.cookie.domain</name><value>host.com</value></property>
<property><name>hadoop.http.authentication.cookie.persistent</name><value>false</value></property>
<property><name>hadoop.http.authentication.simple.anonymous.allowed</name><value>false</value></property>
<property><name>hadoop.http.authentication.kerberos.principal</name><value>http/namenodehostname.host.com@TESTREALM</value></property>
<property><name>hadoop.http.authentication.kerberos.keytab</name><value>/etc/security/keytab/spnego.service.keytab</value></property>
</configuration>

hdfs-site.xml

<configuration>
<property><name>dfs.namenode.name.dir</name><value>file:///home/hadoop/hadoopdata/hdfs/namenode</value></property>
<property><name>dfs.namenode.edits.dir</name><value>file:///home/hadoop/hadooplogs/hdfs/edits</value></property>
<property><name>dfs.replication</name><value>1</value></property>
<property><name>dfs.datanode.http.address</name><value>0.0.0.0:8108</value></property>
<property><name>dfs.datanode.https.address</name><value>0.0.0.0:8109</value></property>
<property><name>dfs.webhdfs.enabled</name><value>true</value></property>
<property><name>dfs.client.use.datanode.hostname</name><value>true</value></property>
<property><name>dfs.datanode.use.datanode.hostname</name><value>true</value></property>
<property><name>dfs.namenode.acls.enabled</name><value>false</value></property>
<property><name>dfs.namenode.posix.acl.inheritance.enabled</name><value>true</value></property>
<property><name>dfs.permissions.enabled</name><value>false</value></property>
<property><name>dfs.namenode.datanode.registration.ip-hostname-check</name><value>false</value></property>
<property><name>dfs.namenode.rpc-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.namenode.servicerpc-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.namenode.http-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.namenode.https-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.block.access.token.enable</name><value>true</value></property>
<property><name>dfs.namenode.keytab.file</name><value>/etc/security/keytab/nn.service.keytab</value></property>
<property><name>dfs.namenode.kerberos.principal</name><value>nn/namenodehostname.host.com@TESTREALM</value></property>
<property><name>dfs.namenode.kerberos.internal.spnego.principal</name><value>HTTP/namenodehostname.host.com@TESTREALM</value></property>
<property><name>dfs.web.authentication.kerberos.keytab</name><value>/etc/security/keytab/spnego.service.keytab</value></property>
<property><name>dfs.web.authentication.kerberos.principal</name><value>HTTP/namenodehostname.host.com@TESTREALM</value></property>
<property><name>dfs.http.policy</name><value>HTTPS_ONLY</value></property>
</configuration>

The first datanode is configured with the following files:

core-site.xml

<configuration>
<property><name>fs.defaultFS</name><value>hdfs://namenodehostname.host.com:9000</value></property>
<property><name>hadoop.security.authentication</name><value>kerberos</value></property>
<property><name>hadoop.security.authorization</name><value>true</value></property>
<property><name>hadoop.rpc.protection</name><value>authentication</value></property>
<property><name>hadoop.security.auth_to_local</name><value>
RULE:[1:$1](nn/namenodehostname.host.com@TESTREALM)s/^.*$/hadoop/
RULE:[1:$1](dn/datanode1hostname.host.com@TESTREALM)s/^.*$/hadoop/
RULE:[1:$1](dn/datanode2hostname.host.com@TESTREALM)s/^.*$/hadoop/
RULE:[1:$1](http/namenodehostname.host.com@TESTREALM)s/^.*$/hadoop/
DEFAULT</value></property>
<property><name>hadoop.http.filter.initializers</name><value>org.apache.hadoop.security.AuthenticationFilterInitializer</value></property>
<property><name>hadoop.http.authentication.token.validity</name><value>3600</value></property>
<property><name>hadoop.http.authentication.cookie.domain</name><value>host.com</value></property>
<property><name>hadoop.http.authentication.cookie.persistent</name><value>false</value></property>
<property><name>hadoop.http.authentication.simple.anonymous.allowed</name><value>false</value></property>
<property><name>hadoop.http.authentication.kerberos.principal</name><value>http/namenodehostname.host.com@TESTREALM</value></property>
<property><name>hadoop.http.authentication.kerberos.keytab</name><value>/etc/security/keytab/spnego.service.keytab</value></property>
</configuration>

The second datanode has the same core-site.xml file of the first datanode and the following hdfs site.xml file.

<configuration>
<property><name>dfs.datanode.data.dir</name><value>file:///home/hadoop/hadoopdata/hdfs/datanode</value></property>
<property><name>dfs.datanode.failed.volumes.tolerated</name><value>0</value></property>
<property><name>dfs.datanode.address</name><value>0.0.0.0:8100</value></property>
<property><name>dfs.datanode.http.address</name><value>0.0.0.0:8108</value></property>
<property><name>dfs.webhdfs.enabled</name><value>true</value></property>
<property><name>dfs.client.use.datanode.hostname</name><value>true</value></property>
<property><name>dfs.datanode.use.datanode.hostname</name><value>true</value></property>
<property><name>dfs.permissions.enabled</name><value>false</value></property>
<property><name>dfs.namenode.datanode.registration.ip-hostname-check</name><value>false</value></property>
<property><name>dfs.namenode.rpc-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.namenode.servicerpc-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.namenode.http-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.namenode.https-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.datanode.hostname</name><value>datanode2hostname.host.com</value></property>
<property><name>dfs.block.access.token.enable</name><value>true</value></property>
<property><name>dfs.datanode.data.dir.perm</name><value>700</value></property>
<property><name>dfs.datanode.keytab.file</name><value>/etc/security/keytab/dn.service.keytab</value></property>
<property><name>dfs.datanode.kerberos.principal</name><value>dn/datanode2hostname.host.com@TESTREALM</value></property>
<property><name>dfs.encrypt.data.transfer</name><value>false</value></property>
<property><name>dfs.datanode.https.address</name><value>0.0.0.0:8109</value></property>
<property><name>dfs.data.transfer.protection</name><value>authentication</value></property>
<property><name>dfs.http.policy</name><value>HTTPS_ONLY</value></property>
</configuration>

The namenode and datanodes have the following ssl-server.xml file.

<configuration>
<property><name>ssl.server.keystore.location</name><value>/home/hadoop/keystore.jks</value></property>
<property><name>ssl.server.keystore.password</name><value>password123.</value></property>
<property><name>ssl.server.keystore.type</name><value>JKS</value></property>
<property><name>ssl.server.truststore.location</name><value>/home/hadoop/truststore.jks</value></property>
<property><name>ssl.server.truststore.password</name><value>password123.</value></property>
<property><name>ssl.server.truststore.type</name><value>JKS</value></property>
</configuration>

Is there anything else I need to do to get login on the namenode web interface?

1

There are 1 answers

0
uds0128 On

Kerberos takes domain name and hostname very seriously. Not sure what is causing problem in your case here I am mentioning some checks using which i debugged such issue when i was facing the same.

First thing, according to my understanding 401 means credentials or tokens didn't provided which was required for authentication.

The reasons of Kerberos credentials not provided.

Here you mentioned that you are using docker stack. It might be the case that you have binded the port of the container running namenode or resourcemanager daemons to localhost port. And you are accessing the same using localhost:port. In this case domain name will be localhost and In firefox settings network.negotiate-auth.trusted-uris = .host.com network.automatic-ntlm-auth.trusted-uris = .host.com you did this settings which will tell firefox to negotiate via gsslib-apis(which is used in SPNEGO and ultimetly to access hadoop UIs) whenever any domain added in this settings site is accessed. So if you are accessing UI using localhost:port your domain will be localhost and firefox will never negotiate for ticket and hense you will get 401.

To solve this issue,

even if you will add localhost to this settings it will start giving you 403 error with message malformed GSS token or invalid GSS Credentials which according to my understanding means credentials or token negotiated but you don't get the session ticket from kerberos TGT to access service which you want to access. here you want to access HTTP/[email protected] service,

so if your firefox is on windows and you are accessing servers inside container you have to expose docker container port and bind it to windows host machine's localhost. Then you can modify your "C:\Windows\System32\drivers\etc\hosts" file to point namenodehostname.host.com hostname to 127.0.0.1 and then access site using namenodehostname.host.com:port (Note: here you have to download MIT Windows Kerberos Client And You have to modify your kerberos client config file on windows so that it can contact kdc so if your kdc is running inside container your MIT windows kerberos client should be able to contact correct kdc inside container. In windows host machine docker case you have to bind localhost:someport to kdccontainer:kdcport and then modify krb5.ini file in windows to point HOST.COM realms kdc=localhost:someport so windows krb5 client can contact kdc inside container)

if firefox is in windows and your are accessing ui using some another hostname Cases like Containers are in some remote VM and you are accessing UI on windows by some other mechanism and other then localhost. Your hostname point to another ip then localhost then its fine in that case in windows kerberos client you have to make sure windows MIT kerberos clinet can contact KDC and get ticket using it.

if firefox in linux and docker containers are running in the same machine, IP's allocated to container will be pingable directly if your containers are configured using bridge network driver in this cases don't bind container ports with localhost, instead add hostnames of containers in /etc/hosts file to point namenodehostname.host.com to the containers ip which is running namenode or resourcemanager UI Then try to access it using namenodehostname.host.com:port and here also you will be needing valid kerberos ticket. So for the same you have to modify /etc/krb5.conf or kerberos client file to contact correct kdc. Then you have to launch terminal with other then root user do kinit and get valid ticket and then open firefox from terminal check for this possibilities also https://superuser.com/questions/1707537/firefox-and-chromium-dont-do-kerberos-negotiation-curl-do

If firefox is in linux and containers are running in remote VM and you are able to access make sure your host linux access website using this namenodehostname.host.com only this stands true for windows host also and make sure host linux can contact correct remote KDC.

Then one thing you can tail kdc log, After getting valid kerberos ticket using kerberos client, when you hit the URL of UI for the first time you becuase of SPNEGO it will contact KDC to get HTTP/fqdn ticket. check this fqdn it should be the namenodehostname.host.com, by chance if you are getting some another fqdn it can try to get that fqdn because as i said above we have to access HTTP/[email protected] so your firefox should request for this service only.

So the 403 error comes when firefox request for HTTP/anotherfqdn so lets say you are in windows and you access it using localhost, firefox will request for HTTP/windowsusername.somedomain(Just Example) and as service is running on HTTP/namenodehostname.host.com. In this case credentials will be negotiated but it will not be valid to access the required service and you will get 403.

If your nodes are having multiple hostnames at that time this issues occurs more. In this cases kdc logs can help you.

Sorry For Long passages, you may find some repeated content and also unstructured writing.

Additionally Look on My Reply on My Question here