StatefulSet replicas on a node has shared memory

384 views Asked by At

I am trying to deploy a stateful set of IPFS replicas on three Kubernetes worker nodes (based on this repo). The first three replicas work properly, but when it comes to the fourth one, it appears that the persistentVolumeClaims point to the shared physical memory. Therefore, the fourth node cannot acquire the lock. What would be the standard way to deploy many IPFS replicas in Kubernetes?

The fourth node printed the following log:

08:44:19.785 DEBUG   cmd/ipfs: config path is /data/ipfs main.go:257
08:44:19.785  INFO   cmd/ipfs: IPFS_PATH /data/ipfs main.go:301
08:44:19.785 DEBUG   cmd/ipfs: Command cannot run on daemon. Checking if daemon is locked main.go:434
08:44:19.785 DEBUG       lock: Checking lock lock.go:32
08:44:19.785 DEBUG       lock: Can't lock file: /data/ipfs/repo.lock.
 reason: cannot acquire lock: Lock FcntlFlock of /data/ipfs/repo.lock failed: resource temporarily unavailable lock.go:44
08:44:19.785 DEBUG     fsrepo: (true)<->Lock is held at /data/ipfs fsrepo.go:302
Error: ipfs daemon is running. please stop it to run this command
Use 'ipfs daemon --help' for information about this command

Here is the yaml file for the stateful set:


apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: ipfs
  namespace: ipfs
spec:
  selector:
    matchLabels:
      app: ipfs
  serviceName: ipfs
  replicas: 6
  template:
    metadata:
      labels:
        app: ipfs
    spec:
      initContainers:
      - name: init-repo
        image: ipfs/go-ipfs:v0.4.11@sha256:e977e1560b960933061efc694c937d711ce1a51aa4a5239acfdff01504b11054
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        command: ['/bin/sh', '/etc/ipfs-config/init.sh']
        volumeMounts:
        - name: www
          mountPath: /data/ipfs
        - name: secrets
          mountPath: /etc/ipfs-secrets
        - name: config
          mountPath: /etc/ipfs-config
      - name: init-peers
        image: ipfs/go-ipfs:v0.4.11@sha256:e977e1560b960933061efc694c937d711ce1a51aa4a5239acfdff01504b11054
        command: ['/bin/sh', '/etc/ipfs-config/peers-kubernetes-refresh.sh']
        volumeMounts:
        - name: www
          mountPath: /data/ipfs
        - name: config
          mountPath: /etc/ipfs-config
      containers:
      - name: ipfs
        image: ipfs/go-ipfs:v0.4.11@sha256:e977e1560b960933061efc694c937d711ce1a51aa4a5239acfdff01504b11054
        env:
        - name: IPFS_LOGGING
          value: debug
        command:
        - ipfs
        - daemon
        ports:
        - containerPort: 4001
          name: swarm
        - containerPort: 5001
          name: api
        - containerPort: 8080
          name: readonly
        volumeMounts:
        - name: www
          mountPath: /data/ipfs
      volumes:
      - name: secrets
        secret:
          secretName: ipfs
      - name: config
        configMap:
          name: ipfs-config
      - name: www
        persistentVolumeClaim:
          claimName: ipfs-pvc

Here is the persistent volume definition

apiVersion: v1
kind: PersistentVolume
metadata:
  name: ipfs-pv
  namespace: ipfs
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 200Mi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/mnt/data"

And the persistent volume claim definition:

vapiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ipfs-pvc
  namespace: ipfs
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 200Mi

kubectl describe of the failing node is as follows:

Name:         ipfs-3
Namespace:    ipfs
Priority:     0
Node:         swift-153/10.70.20.153
Start Time:   Tue, 27 Oct 2020 14:38:11 -0400
Labels:       app=ipfs
              controller-revision-hash=ipfs-74bb88dbb6
              statefulset.kubernetes.io/pod-name=ipfs-3
Annotations:  <none>
Status:       Running
IP:           10.244.3.43
IPs:
  IP:           10.244.3.43
Controlled By:  StatefulSet/ipfs
Containers:
  ipfs:
    Container ID:  docker://81349e969be9ffcafeb4d65adf9d0b2de7311e46068e36dd4f227f169f6dfcab
    Image:         ipfs/go-ipfs:v0.4.11@sha256:e977e1560b960933061efc694c937d711ce1a51aa4a5239acfdff01504b11054
    Image ID:      docker-pullable://ipfs/go-ipfs@sha256:e977e1560b960933061efc694c937d711ce1a51aa4a5239acfdff01504b11054
    Ports:         4001/TCP, 5001/TCP, 8080/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Command:
      ipfs
      daemon
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 27 Oct 2020 14:39:51 -0400
      Finished:     Tue, 27 Oct 2020 14:39:51 -0400
    Ready:          False
    Restart Count:  4
    Environment:
      IPFS_LOGGING:  debug
    Mounts:
      /data/ipfs from www (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-hb785 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  secrets:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ipfs
    Optional:    false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      ipfs-config
    Optional:  false
  www:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  ipfs-pvc
    ReadOnly:   false
  default-token-hb785:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-hb785
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Normal   Scheduled         2m24s                  default-scheduler  Successfully assigned ipfs/ipfs-3 to swift-153
  Normal   Pulled            2m2s (x3 over 2m21s)   kubelet            Container image "ipfs/go-ipfs:v0.4.11@sha256:e977e1560b960933061efc694c937d711ce1a51aa4a5239acfdff01504b11054" already present on machine
  Normal   Created           2m (x3 over 2m19s)     kubelet            Created container ipfs
  Normal   Started           2m (x3 over 2m19s)     kubelet            Started container ipfs
  Warning  DNSConfigForming  103s (x10 over 2m24s)  kubelet            Search Line limits were exceeded, some search paths have been omitted, the applied search line is: ipfs.svc.cluster.local svc.cluster.local cluster.local search syslab.sandbox cs.toronto.edu
  Warning  BackOff           103s (x6 over 2m15s)   kubelet            Back-off restarting failed container
0

There are 0 answers