Zookeeper pod can't access mounted persistent volume claim

1.4k views Asked by At

I'm stuck with an annoying issue, where my pod can't access the mounted persistent volume.

Kubeadm: v1.19.2
Docker: 19.03.13
Zookeeper image: library/zookeeper:3.6
Cluster info: Locally hosted, no Cloud Provide

K8s configuration:

apiVersion: v1
kind: Service
metadata:
  name: zk-hs
  labels:
    app: zk
spec:
  selector:
    app: zk
  ports:
    - port: 2888
      targetPort: 2888
      name: server
      protocol: TCP
    - port: 3888
      targetPort: 3888
      name: leader-election
      protocol: TCP
  clusterIP: ""
  type: LoadBalancer
---
apiVersion: v1
kind: Service
metadata:
  name: zk-cs
  labels:
    app: zk
spec:
  selector:
    app: zk
  ports:
    - name: client
      protocol: TCP
      port: 2181
      targetPort: 2181
  type: LoadBalancer
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: zk-pdb
spec:
  selector:
    matchLabels:
      app: zk
  maxUnavailable: 1
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: zk
spec:
  selector:
    matchLabels:
      app: zk
  serviceName: zk-hs
  replicas: 1
  updateStrategy:
    type: RollingUpdate
  podManagementPolicy: OrderedReady
  template:
    metadata:
      labels:
        app: zk
    spec:
      volumes:
        - name: zoo-config
          configMap:
            name: zoo-config
        - name: datadir
          persistentVolumeClaim:
            claimName: zoo-pvc
      containers:
        - name: zookeeper
          imagePullPolicy: Always
          image: "library/zookeeper:3.6"
          resources:
            requests:
              memory: "1Gi"
              cpu: "0.5"
          ports:
            - containerPort: 2181
              name: client
            - containerPort: 2888
              name: server
            - containerPort: 3888
              name: leader-election
          volumeMounts:
            - name: datadir
              mountPath: /var/lib/zookeeper/data
            - name: zoo-config
              mountPath: /conf
      securityContext:
        fsGroup: 2000
        runAsUser: 1000
        runAsNonRoot: true
  volumeClaimTemplates:
    - metadata:
        name: datadir
        annotations:
          volume.beta.kubernetes.io/storage-class: local-storage
      spec:
        accessModes: [ "ReadWriteOnce" ]
        storageClassName: local-storage
        resources:
          requests:
            storage: 10Gi
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: zoo-config
  namespace: default
data:
  zoo.cfg: |
    tickTime=10000
    dataDir=/var/lib/zookeeper/data
    clientPort=2181
    initLimit=10
    syncLimit=4
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
---
kind: PersistentVolume
apiVersion: v1
metadata:
  name: zoo-pv
  labels:
    type: local
spec:
  storageClassName: local-storage
  persistentVolumeReclaimPolicy: Retain
  hostPath:
      path: "/mnt/data"
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - <node-name>

I've tried running the pod as root with the following security context, which I know is a terrible idea, purely as a test. This however caused a bunch of other issues.

securityContext:
  fsGroup: 0
  runAsUser: 0

Once the pod starts up the logs contain the following,

Zookeeper JMX enabled by default
Using config: /conf/zoo.cfg
<log4j Warnings>
Unable too access datadir, exiting abnormally

Inspecting the pod, provides me with the following information,

~$ kubectl describe pod/zk-0
Name:         zk-0
Namespace:    default
Priority:     0
Node:         <node>
Start Time:   Sat, 26 Sep 2020 15:48:00 +0200
Labels:       app=zk
              controller-revision-hash=zk-6c68989bd
              statefulset.kubernetes.io/pod-name=zk-0
Annotations:  <none>
Status:       Running
IP:           <IP>
IPs:
  IP:           <IP>
Controlled By:  StatefulSet/zk
Containers:
  zookeeper:
    Container ID:   docker://281e177d677394604785542c231d21b71f1666a22e74c1c10ef88491dad7a522
    Image:          library/zookeeper:3.6
    Image ID:       docker-pullable://zookeeper@sha256:6c051390cfae7958ff427834937c353fc6c34484f6a84b3e4bc8c512b53a16f6
    Ports:          2181/TCP, 2888/TCP, 3888/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    3
      Started:      Sat, 26 Sep 2020 16:04:26 +0200
      Finished:     Sat, 26 Sep 2020 16:04:27 +0200
    Ready:          False
    Restart Count:  8
    Requests:
      cpu:        500m
      memory:     1Gi
    Environment:  <none>
    Mounts:
      /conf from zoo-config (rw)
      /var/lib/zookeeper/data from datadir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-88x56 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  datadir:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  datadir-zk-0
    ReadOnly:   false
  zoo-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      zoo-config
    Optional:  false
  default-token-88x56:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-88x56
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  17m                   default-scheduler  Successfully assigned default/zk-0 to <node>
  Normal   Pulled     17m                   kubelet            Successfully pulled image "library/zookeeper:3.6" in 1.932381527s
  Normal   Pulled     17m                   kubelet            Successfully pulled image "library/zookeeper:3.6" in 1.960610662s
  Normal   Pulled     17m                   kubelet            Successfully pulled image "library/zookeeper:3.6" in 1.959935633s
  Normal   Created    16m (x4 over 17m)     kubelet            Created container zookeeper
  Normal   Pulled     16m                   kubelet            Successfully pulled image "library/zookeeper:3.6" in 1.92551645s
  Normal   Started    16m (x4 over 17m)     kubelet            Started container zookeeper
  Normal   Pulling    15m (x5 over 17m)     kubelet            Pulling image "library/zookeeper:3.6"
  Warning  BackOff    2m35s (x71 over 17m)  kubelet            Back-off restarting failed container

To me, it seems like the pod has full rw access to the volume, so I'm unsure why it's still refusing to access the directory. Any help will be appreciated!

1

There are 1 answers

0
Iggydv On BEST ANSWER

After quite some digging, I finally figured out why it wasn't working. The logs were actually telling me all I needed to know in the end, the mounted persistentVolumeClaim simply did not have the correct file permissions to read from the mounted hostpath /mnt/data directory

To fix this, in a somewhat hacky way, I gave read, write & execute permissions to all.

chmod 777 /mnt/data

Overview can be found here

This is definitely not the most secure way, of fixing the issue, and I would strongly advise against using this in any production like environment.

Probably a better approach would be the following

sudo usermod -a -G 1000 1000