Why does DNS addressability between namespaces depend on pod definition type?

63 views Asked by At

I'm running into a challenge with creating a mongo replica set that spans two namespaces in a single cluster. I thought I had things worked out but now I'm confused about how NodePort services interact with pods.

I had initially been able to get the replica set working using fairly simple mongo pod definition (i.e., kind: Pod) and NodePort service definition (I will ultimately need external visibility.) Here's the NodePort definition for ns0:

---
apiVersion: v1
kind: Service
metadata:
  name: mongo
spec:
  selector:
    name: mongo
  type: NodePort
  ports:
  - name: mongo
    port: 30000
    targetPort: 30000
    nodePort: 30000

This configuration created DNS names like mongo.mongo.ns0.svc.cluster.local that can be addressed from both pods in ns0 and in ns1.

Later, when attempting to apply some changes to a previously deployed set of pods, I learned that there are certain things you cannot update for a pod. I think that helped me understand why pods are not typically used in production, and I modified the yaml to use a kind: StatefulSet instead of kind: Pod. (Sorry for the awkward language here, I'm trying to avoid the term 'deployment' since it means something specific in k8s.)

This resulted in a new DNS name for the pod: mongo-0.mongo.ns0.svc.cluster.local which is reachable within ns0. However, that name is not addressable from ns1. Nothing was changed about the NodePort definition. This made me question whether the NodePort was even necessary for the kind: Pod definition, but when I remove it, I can no longer connect to the node in ns0 from a pod in ns1.

This behavior seems to be consistent across different k8s platforms. Can someone help me understand what is going on? I think I might be confused about how k8s DNS is supposed to work. I just need single stable (not load balanced) DNS name per mongo node in each namespace that is reachable from other namespaces. Why would the way the pod is defined change whether a pod is addressable from another namespace? Are the pod names created by a kind: StatefulSet somehow different from those created by a kind: Pod in terms of DNS?

Working example of DNS differences

Starting with two empty kubernetes namespaces ns0 and ns1, we can create two Persistent Volume Claims (PVCs). I don't think this relates to DNS but I have had trouble getting mongo to run in an k8s environment without it, so I am including this for completeness:

volume.yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mongodb-data
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 5Gi

Once this PVC has been created in both namespaces, we can deploy mongo. Here are the pod definitions for ns0 (A) and ns1 (B). These should be identical aside from the port numbers (30000 and 30001 respectively) and the namespace reference in the bind parameter.

mongo-a.yaml

---
apiVersion: v1
kind: Service
metadata:
  name: mongo
spec:
  selector:
    name: mongo
  type: NodePort
  ports:
  - name: mongo
    port: 30000
    targetPort: 30000
    nodePort: 30000
---
apiVersion: v1
kind: Pod
metadata:
  name: mongo
  labels:
    name: mongo
spec:
  hostname: mongo
  subdomain: mongo
  terminationGracePeriodSeconds: 10
  volumes:
    - name: mongodb-data
      persistentVolumeClaim:
        claimName: mongodb-data
  containers:
    - name: mongo
      image: mongo:latest
      command:
        - mongod
        - "--port"
        - "30000"
        - "--replSet"
        - rs
        - "--bind_ip"
        - "localhost,mongo.mongo.ns0.svc.cluster.local"
      volumeMounts:
        - name: mongodb-data
          mountPath: /data/db
      ports:
        - containerPort: 30000

While in ns0, we can create/apply this definition.

Once created, we can exec into the mongo pod. To verify it is running using its DNS host name:

mongosh mongodb://mongo.mongo.ns0.svc.cluster.local:30000

Now, switching to ns1 (kubectl config use-context ns1) we can apply/create the second mongo node.

mongo-b.yaml

---
apiVersion: v1
kind: Service
metadata:
  name: mongo
spec:
  selector:
    name: mongo
  type: NodePort
  ports:
  - name: mongo
    port: 30001
    targetPort: 30001
    nodePort: 30001
---
apiVersion: v1
kind: Pod
metadata:
  name: mongo
  labels:
    name: mongo
spec:
  hostname: mongo
  subdomain: mongo
  terminationGracePeriodSeconds: 10
  volumes:
    - name: mongodb-data
      persistentVolumeClaim:
        claimName: mongodb-data
  containers:
    - name: mongo
      image: mongo:latest
      command:
        - mongod
        - "--port"
        - "30001"
        - "--replSet"
        - rs
        - "--bind_ip"
        - "localhost,mongo.mongo.ns1.svc.cluster.local"
      volumeMounts:
        - name: mongodb-data
          mountPath: /data/db
      ports:
        - containerPort: 30001

Now we can exec into the second node and confirm using:

mongosh mongodb://mongo.mongo.ns1.svc.cluster.local:30001

Now that we have confirmed both sides are running. We can connect from one to the other. Exit mongosh on the mongo-b node (ns1) and connect to the other node in ns0 using:

mongosh mongodb://mongo.mongo.ns0.svc.cluster.local:30000

This should connect successfully. From the mongo-a pod in ns0, you can likewise connect to the mongo-b node.

However, when I convert the pod definition to a stateful set:

mongo-a.stateful.yaml

apiVersion: v1
kind: Service
metadata:
  name: mongo
spec:
  selector:
    name: mongo
  type: NodePort
  ports:
  - name: mongo
    port: 30000
    targetPort: 30000
    nodePort: 30000
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mongo
spec:
  serviceName: "mongo"
  replicas: 1
  selector:
    matchLabels:
      app: mongo
  template:
    metadata:
      labels:
        app: mongo
    spec:
      hostname: mongo
      subdomain: mongo
      terminationGracePeriodSeconds: 10
      volumes:
        - name: mongodb-data
          persistentVolumeClaim:
            claimName: mongodb-data
      containers:
        - name: mongo
          image: mongo:latest
          command:
            - mongod
            - "--port"
            - "30000"
            - "--replSet"
            - rs
            - "--bind_ip"
            - "localhost,mongo-0.mongo.ns0.svc.cluster.local"
          volumeMounts:
            - name: mongodb-data
              mountPath: /data/db
          ports:
            - containerPort: 30000

I can connect exec into the ns0 mongo-0 pod and connect to the local node using:

mongosh mongodb://mongo-0.mongo.ns0.svc.cluster.local:30000

However, from a mongo node running in ns1, that command fails with: MongoNetworkError: getaddrinfo ENOTFOUND mongo-0.mongo.ns0.svc.cluster.local

So, to re-cap: when defined as 'raw' pods, both:

mongo.mongo.ns0.svc.cluster.local:30000
mongo.mongo.ns1.svc.cluster.local:30001

Are resolvable from either namespace. But when defined as statefulsets:

mongo-0.mongo.ns0.svc.cluster.local:30000
mongo-0.mongo.ns1.svc.cluster.local:30001

Can only be resolved within the corresponding namespace.

I'd like to understand why. I've tried messing around with service definitions but I'm mostly just flailing and nothing I've tried has worked to make these resolvable from the other namespace.

1

There are 1 answers

6
larsks On

In order for Pods in a StatefulSet to have the stable DNS names you're looking for, the service must be headless:

A StatefulSet can use a Headless Service to control the domain of its Pods. The domain managed by this Service takes the form: $(service name).$(namespace).svc.cluster.local, where "cluster.local" is the cluster domain. As each Pod is created, it gets a matching DNS subdomain, taking the form: $(podname).$(governing service domain), where the governing service is defined by the serviceName field on the StatefulSet.

A headless service is one in which the clusterIP is explicitly None. This means you cannot use a NodePort service for this purpose (because a NodePort service must have a non-None clusterIP).

So for those to work the way you want, you would need something like this:

apiVersion: v1
kind: Namespace
metadata:
  name: ns0
spec:
  finalizers:
  - kubernetes
---
apiVersion: v1
kind: Service
metadata:
  labels:
    name: mongo
  name: mongo
  namespace: ns0
spec:
  clusterIP: None
  ports:
  - name: mongo
    port: 27017
    targetPort: 27017
  selector:
    name: mongo
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    name: mongo
  name: mongo
  namespace: ns0
spec:
  replicas: 2
  selector:
    matchLabels:
      name: mongo
  serviceName: mongo
  template:
    metadata:
      labels:
        name: mongo
    spec:
      containers:
      - env:
        - name: MONGO_INITDB_ROOT_USERNAME
          value: root
        - name: MONGO_INITDB_ROOT_PASSWORD
          value: example
        image: docker.io/mongo:7
        name: mongo
        ports:
        - containerPort: 27017
          name: mongo
  volumeClaimTemplates:
  - metadata:
      labels:
        name: mongo
      name: mongo-data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi

And similarly for namespace ns1. With this in place, from another pod running in ns0, we can look up the mongo pods in the current namespace:

/ # host mongo-0.mongo.ns0.svc.cluster.local
mongo-0.mongo.ns0.svc.cluster.local has address 10.244.0.42
/ # host mongo-1.mongo.ns0.svc.cluster.local
mongo-1.mongo.ns0.svc.cluster.local has address 10.244.0.46

And in other namespaces:

/ # host mongo-0.mongo.ns1.svc.cluster.local
mongo-0.mongo.ns1.svc.cluster.local has address 10.244.0.43
/ # host mongo-1.mongo.ns1.svc.cluster.local
mongo-1.mongo.ns1.svc.cluster.local has address 10.244.0.47

If you also want to expose your service using a NodePort, you could create a second Service resource, e.g:

apiVersion: v1
kind: Service
metadata:
  labels:
    name: mongo
  name: mongo-nodeport
  namespace: ns0
spec:
  type: NodePort
  ports:
  - name: mongo
    targetPort: 27017
    port: 27017

    # Must be in range 30000-32767
    nodePort: 30017
  selector:
    name: mongo

can you help me understand or point me to a reference for why a directly created pod gets a cluster-level DNS name but those created by a StatefulSet do not?

I don't believe that Pods get a cluster-level DNS name by default. If I deploy these resources:

apiVersion: v1
kind: Service
metadata:
  labels:
    name: mongo
  name: mongo-svc
  namespace: ns0
spec:
  ports:
  - name: mongo
    nodePort: 30017
    port: 27017
    targetPort: 27017
  selector:
    name: mongo
  type: NodePort
---
apiVersion: v1
kind: Pod
metadata:
  labels:
    name: mongo
  name: mongo-pod
  namespace: ns0
spec:
  containers:
  - env:
    - name: MONGO_INITDB_ROOT_USERNAME
      value: root
    - name: MONGO_INITDB_ROOT_PASSWORD
      value: example
    image: docker.io/mongo:7
    name: mongo
    ports:
    - containerPort: 27017
      name: mongo

Then we see:

/ # host mongo-pod
Host mongo-pod not found: 3(NXDOMAIN)
/ # host mongo-svc
mongo-svc.ns0.svc.cluster.local has address 10.96.40.154

A DNS name is only allocated for the service, not for the pod. A pod is able to resolve its own name via /etc/hosts, so from within the mongo-pod pod, this will work:

root@mongo-pod:/# ping -c1 mongo-pod
PING mongo-pod (10.244.0.89): 56 data bytes
64 bytes from 10.244.0.89: icmp_seq=0 ttl=64 time=0.018 ms
--- mongo-pod ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.018/0.018/0.018/0.000 ms

But that won't work elsewhere, even from other pods in the same namespace, where we would see something like:

/ # ping mongo-pod
ping: bad address 'mongo-pod'