Quorum nodes run fine with docker-compose, but crash when deployed on kubernetes

392 views Asked by At

Following the steps outlined here, I created a basic Quorum network with 4 nodes and IBFT consensus. I then created a docker image for each of the nodes, copying the contents of each node's directory on to the image. The image was created from the official quorumengineering/quorum image, and when started as a container it executes the geth command. An example Dockerfile follows (different nodes have different rpcports/ports):

FROM quorumengineering/quorum
WORKDIR /opt/node
COPY . /opt/node
ENTRYPOINT []
CMD PRIVATE_CONFIG=ignore nohup geth --datadir data --nodiscover --istanbul.blockperiod 5 --syncmode full --mine --minerthreads 1 --verbosity 5 --networkid 10 --rpc --rpcaddr 0.0.0.0 --rpcport 22001 --rpcapi admin,db,eth,debug,miner,net,shh,txpool,personal,web3,quorum,istanbul  --rpcvhosts="*" --emitcheckpoints --port 30304

I then made a docker-compose file to run the images.

version: '2'
volumes:
  qnode0-data:
  qnode1-data:
  qnode2-data:
  qnode3-data:
services:
  qnode0:
    container_name: qnode0
    image: <myDockerHub>/qnode0
    ports:
      - 22000:22000
      - 30303:30303
    volumes:
      - qnode0-data:/opt/node
  qnode1:
    container_name: qnode1
    image: <myDockerHub>/qnode1
    ports:
      - 22001:22001
      - 30304:30304
    volumes:
      - qnode1-data:/opt/node
  qnode2:
    container_name: qnode2
    image: <myDockerHub>/qnode2
    ports:
      - 22002:22002
      - 30305:30305
    volumes:
      - qnode2-data:/opt/node
  qnode3:
    container_name: qnode3
    image: <myDockerHub>/qnode3
    ports:
      - 22003:22003
      - 30306:30306
    volumes:
      - qnode3-data:/opt/node

When running these images locally with docker-compose, the nodes start and I can even see the created blocks via a blockchain explorer. However, when I try to run this in a kubernetes cluster, either locally with minikube, or on AWS, the nodes do not start but rather crash. To deploy on kubernetes I made the following three yaml files for each node (12 files in total):

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: qnode0
  name: qnode0
spec:
  replicas: 1
  selector:
    matchLabels:
      app: qnode0
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: qnode0
    spec:
      containers:
      - image: <myDockerHub>/qnode0
        imagePullPolicy: ""
        name: qnode0
        ports:
        - containerPort: 22000
        - containerPort: 30303
        resources: {}
        volumeMounts:
        - mountPath: /opt/node
          name: qnode0-data
      restartPolicy: Always
      serviceAccountName: ""
      volumes:
      - name: qnode0-data
        persistentVolumeClaim:
          claimName: qnode0-data
status: {}

service.yaml

apiVersion: v1
kind: Service
metadata:
  name: qnode0-service
spec:
  selector:
    app: qnode0
  ports:
    - name: rpcport
      protocol: TCP
      port: 22000
      targetPort: 22000
    - name: netlistenport
      protocol: TCP
      port: 30303
      targetPort: 30303

persistentvolumeclaim.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  labels:
    app: qnode0-data
  name: qnode0-data
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 100Mi
status: {}

When trying to run on a kubernetes cluster, each node runs into this error:

ERROR[] Cannot start mining without etherbase    err="etherbase must be explicitly specified"
Fatal: Failed to start mining: etherbase missing: etherbase must be explicitly specified

which does not occur when running locally with docker-compose. After examining the logs, I saw there is a difference between how the nodes startup locally with docker-compose and on kubernetes, which is the following lines:

locally I see the following lines in each node's output:

INFO [] Initialising Ethereum protocol           name=istanbul versions="[99 64]" network=10 dbversion=7
...
DEBUG[] InProc registered                        namespace=istanbul

on kubernetes (either in minikube or AWS) I see these lines differently:

INFO [] Initialising Ethereum protocol           name=eth versions="[64 63]" network=10 dbversion=7
...
DEBUG[] IPC registered                           namespace=eth
DEBUG[] IPC registered                           namespace=ethash

Why is this happening? What is the significance of name=istanbul/eth? My common sense logic says that the error happens because of the use of name=eth, instead of name=istanbul. But I don't know the significance of this, and more importantly, I don't know what it is I did to inadvertently affect the kubernetes deployment.

Any ideas how to fix this?

EDIT

I tried to address what David Maze mentioned in his comment, i.e. that the node directory gets overwritten, so I created a new directory in the image with

RUN mkdir /opt/nodedata/

and used that to mount volumes in kubernetes. I also used StatefulSets instead of Deployments in kubernetes. The relevant yaml follows:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: qnode0
spec:
  serviceName: qnode0
  replicas: 1
  selector:
    matchLabels:
      app: qnode0
  template:
    metadata:
      labels:
        app: qnode0
    spec:
      containers:
      - image: <myDockerHub>/qnode0
        imagePullPolicy: ""
        name: qnode0
        ports:
        - protocol: TCP
          containerPort: 22000
        - protocol: TCP
          containerPort: 30303
        volumeMounts:
        - mountPath: /opt/nodedata
          name: qnode0-data
      restartPolicy: Always
      serviceAccountName: ""
      volumes:
      - name: qnode0-data
        persistentVolumeClaim:
          claimName: qnode0-data

Changing the volume mount immediately produced the correct behaviour of

INFO [] Initialising Ethereum protocol           name=istanbul

However, I had networking issues, which I solved by using the environment variables that kubernetes sets for each service, which include the IP each service is running at, e.g.:

QNODE0_PORT_30303_TCP_ADDR=172.20.115.164

I also changed my kubernetes services a little, as follows:

apiVersion: v1
kind: Service
metadata:
  labels:
    app: qnode0
  name: qnode0
spec:
  ports:
  - name: "22000"
    port: 22000
    targetPort: 22000
  - name: "30303"
    port: 30303
    targetPort: 30303
  selector:
    app: qnode0

Using the environment variables to properly initialise the quorum files solved the networking problem.

However, when I delete my stateful sets and my services with:

kubectl delete -f <my_statefulset_and_service_yamls>

and then apply them again:

kubectl apply -f <my_statefulset_and_service_yamls>

quorum starts from scratch, i.e. it does not continue block creation from where it stopped but starts from 1 again, as follows:

Inserted new block number=1 hash=1c99d0…fe59bb

So the state of the blockchain is not saved, as was my initial fear. What should I do to address this?

0

There are 0 answers