I'm running into a challenge with creating a mongo replica set that spans two namespaces in a single cluster. I thought I had things worked out but now I'm confused about how NodePort
services interact with pods.
I had initially been able to get the replica set working using fairly simple mongo pod definition (i.e., kind: Pod
) and NodePort
service definition (I will ultimately need external visibility.) Here's the NodePort
definition for ns0:
---
apiVersion: v1
kind: Service
metadata:
name: mongo
spec:
selector:
name: mongo
type: NodePort
ports:
- name: mongo
port: 30000
targetPort: 30000
nodePort: 30000
This configuration created DNS names like mongo.mongo.ns0.svc.cluster.local
that can be addressed from both pods in ns0
and in ns1
.
Later, when attempting to apply some changes to a previously deployed set of pods, I learned that there are certain things you cannot update for a pod. I think that helped me understand why pods are not typically used in production, and I modified the yaml to use a kind: StatefulSet
instead of kind: Pod
. (Sorry for the awkward language here, I'm trying to avoid the term 'deployment' since it means something specific in k8s.)
This resulted in a new DNS name for the pod: mongo-0.mongo.ns0.svc.cluster.local
which is reachable within ns0
. However, that name is not addressable from ns1
. Nothing was changed about the NodePort
definition. This made me question whether the NodePort
was even necessary for the kind: Pod
definition, but when I remove it, I can no longer connect to the node in ns0
from a pod in ns1
.
This behavior seems to be consistent across different k8s platforms. Can someone help me understand what is going on? I think I might be confused about how k8s DNS is supposed to work. I just need single stable (not load balanced) DNS name per mongo node in each namespace that is reachable from other namespaces. Why would the way the pod is defined change whether a pod is addressable from another namespace? Are the pod names created by a kind: StatefulSet
somehow different from those created by a kind: Pod
in terms of DNS?
Working example of DNS differences
Starting with two empty kubernetes namespaces ns0 and ns1, we can create two Persistent Volume Claims (PVCs). I don't think this relates to DNS but I have had trouble getting mongo to run in an k8s environment without it, so I am including this for completeness:
volume.yaml:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mongodb-data
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 5Gi
Once this PVC has been created in both namespaces, we can deploy mongo. Here are the pod definitions for ns0
(A) and ns1
(B). These should be identical aside from the port numbers (30000
and 30001
respectively) and the namespace reference in the bind parameter.
mongo-a.yaml
---
apiVersion: v1
kind: Service
metadata:
name: mongo
spec:
selector:
name: mongo
type: NodePort
ports:
- name: mongo
port: 30000
targetPort: 30000
nodePort: 30000
---
apiVersion: v1
kind: Pod
metadata:
name: mongo
labels:
name: mongo
spec:
hostname: mongo
subdomain: mongo
terminationGracePeriodSeconds: 10
volumes:
- name: mongodb-data
persistentVolumeClaim:
claimName: mongodb-data
containers:
- name: mongo
image: mongo:latest
command:
- mongod
- "--port"
- "30000"
- "--replSet"
- rs
- "--bind_ip"
- "localhost,mongo.mongo.ns0.svc.cluster.local"
volumeMounts:
- name: mongodb-data
mountPath: /data/db
ports:
- containerPort: 30000
While in ns0
, we can create/apply this definition.
Once created, we can exec into the mongo pod. To verify it is running using its DNS host name:
mongosh mongodb://mongo.mongo.ns0.svc.cluster.local:30000
Now, switching to ns1 (kubectl config use-context ns1
) we can apply/create the second mongo node.
mongo-b.yaml
---
apiVersion: v1
kind: Service
metadata:
name: mongo
spec:
selector:
name: mongo
type: NodePort
ports:
- name: mongo
port: 30001
targetPort: 30001
nodePort: 30001
---
apiVersion: v1
kind: Pod
metadata:
name: mongo
labels:
name: mongo
spec:
hostname: mongo
subdomain: mongo
terminationGracePeriodSeconds: 10
volumes:
- name: mongodb-data
persistentVolumeClaim:
claimName: mongodb-data
containers:
- name: mongo
image: mongo:latest
command:
- mongod
- "--port"
- "30001"
- "--replSet"
- rs
- "--bind_ip"
- "localhost,mongo.mongo.ns1.svc.cluster.local"
volumeMounts:
- name: mongodb-data
mountPath: /data/db
ports:
- containerPort: 30001
Now we can exec into the second node and confirm using:
mongosh mongodb://mongo.mongo.ns1.svc.cluster.local:30001
Now that we have confirmed both sides are running. We can connect from one to the other. Exit mongosh on the mongo-b
node (ns1
) and connect to the other node in ns0
using:
mongosh mongodb://mongo.mongo.ns0.svc.cluster.local:30000
This should connect successfully. From the mongo-a
pod in ns0, you can likewise connect to the mongo-b node.
However, when I convert the pod definition to a stateful set:
mongo-a.stateful.yaml
apiVersion: v1
kind: Service
metadata:
name: mongo
spec:
selector:
name: mongo
type: NodePort
ports:
- name: mongo
port: 30000
targetPort: 30000
nodePort: 30000
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mongo
spec:
serviceName: "mongo"
replicas: 1
selector:
matchLabels:
app: mongo
template:
metadata:
labels:
app: mongo
spec:
hostname: mongo
subdomain: mongo
terminationGracePeriodSeconds: 10
volumes:
- name: mongodb-data
persistentVolumeClaim:
claimName: mongodb-data
containers:
- name: mongo
image: mongo:latest
command:
- mongod
- "--port"
- "30000"
- "--replSet"
- rs
- "--bind_ip"
- "localhost,mongo-0.mongo.ns0.svc.cluster.local"
volumeMounts:
- name: mongodb-data
mountPath: /data/db
ports:
- containerPort: 30000
I can connect exec into the ns0 mongo-0
pod and connect to the local node using:
mongosh mongodb://mongo-0.mongo.ns0.svc.cluster.local:30000
However, from a mongo node running in ns1
, that command fails with: MongoNetworkError: getaddrinfo ENOTFOUND mongo-0.mongo.ns0.svc.cluster.local
So, to re-cap: when defined as 'raw' pods, both:
mongo.mongo.ns0.svc.cluster.local:30000
mongo.mongo.ns1.svc.cluster.local:30001
Are resolvable from either namespace. But when defined as statefulsets:
mongo-0.mongo.ns0.svc.cluster.local:30000
mongo-0.mongo.ns1.svc.cluster.local:30001
Can only be resolved within the corresponding namespace.
I'd like to understand why. I've tried messing around with service definitions but I'm mostly just flailing and nothing I've tried has worked to make these resolvable from the other namespace.
In order for Pods in a StatefulSet to have the stable DNS names you're looking for, the service must be headless:
A headless service is one in which the
clusterIP
is explicitlyNone
. This means you cannot use aNodePort
service for this purpose (because aNodePort
service must have a non-None
clusterIP
).So for those to work the way you want, you would need something like this:
And similarly for namespace
ns1
. With this in place, from another pod running inns0
, we can look up themongo
pods in the current namespace:And in other namespaces:
If you also want to expose your service using a
NodePort
, you could create a second Service resource, e.g:I don't believe that Pods get a cluster-level DNS name by default. If I deploy these resources:
Then we see:
A DNS name is only allocated for the service, not for the pod. A pod is able to resolve its own name via
/etc/hosts
, so from within themongo-pod
pod, this will work:But that won't work elsewhere, even from other pods in the same namespace, where we would see something like: