Following the steps outlined here, I created a basic Quorum network with 4 nodes and IBFT consensus. I then created a docker image for each of the nodes, copying the contents of each node's directory on to the image. The image was created from the official quorumengineering/quorum
image, and when started as a container it executes the geth command. An example Dockerfile follows (different nodes have different rpcports/ports):
FROM quorumengineering/quorum
WORKDIR /opt/node
COPY . /opt/node
ENTRYPOINT []
CMD PRIVATE_CONFIG=ignore nohup geth --datadir data --nodiscover --istanbul.blockperiod 5 --syncmode full --mine --minerthreads 1 --verbosity 5 --networkid 10 --rpc --rpcaddr 0.0.0.0 --rpcport 22001 --rpcapi admin,db,eth,debug,miner,net,shh,txpool,personal,web3,quorum,istanbul --rpcvhosts="*" --emitcheckpoints --port 30304
I then made a docker-compose file to run the images.
version: '2'
volumes:
qnode0-data:
qnode1-data:
qnode2-data:
qnode3-data:
services:
qnode0:
container_name: qnode0
image: <myDockerHub>/qnode0
ports:
- 22000:22000
- 30303:30303
volumes:
- qnode0-data:/opt/node
qnode1:
container_name: qnode1
image: <myDockerHub>/qnode1
ports:
- 22001:22001
- 30304:30304
volumes:
- qnode1-data:/opt/node
qnode2:
container_name: qnode2
image: <myDockerHub>/qnode2
ports:
- 22002:22002
- 30305:30305
volumes:
- qnode2-data:/opt/node
qnode3:
container_name: qnode3
image: <myDockerHub>/qnode3
ports:
- 22003:22003
- 30306:30306
volumes:
- qnode3-data:/opt/node
When running these images locally with docker-compose, the nodes start and I can even see the created blocks via a blockchain explorer. However, when I try to run this in a kubernetes cluster, either locally with minikube, or on AWS, the nodes do not start but rather crash. To deploy on kubernetes I made the following three yaml files for each node (12 files in total):
deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: qnode0
name: qnode0
spec:
replicas: 1
selector:
matchLabels:
app: qnode0
strategy:
type: Recreate
template:
metadata:
labels:
app: qnode0
spec:
containers:
- image: <myDockerHub>/qnode0
imagePullPolicy: ""
name: qnode0
ports:
- containerPort: 22000
- containerPort: 30303
resources: {}
volumeMounts:
- mountPath: /opt/node
name: qnode0-data
restartPolicy: Always
serviceAccountName: ""
volumes:
- name: qnode0-data
persistentVolumeClaim:
claimName: qnode0-data
status: {}
service.yaml
apiVersion: v1
kind: Service
metadata:
name: qnode0-service
spec:
selector:
app: qnode0
ports:
- name: rpcport
protocol: TCP
port: 22000
targetPort: 22000
- name: netlistenport
protocol: TCP
port: 30303
targetPort: 30303
persistentvolumeclaim.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
labels:
app: qnode0-data
name: qnode0-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Mi
status: {}
When trying to run on a kubernetes cluster, each node runs into this error:
ERROR[] Cannot start mining without etherbase err="etherbase must be explicitly specified"
Fatal: Failed to start mining: etherbase missing: etherbase must be explicitly specified
which does not occur when running locally with docker-compose. After examining the logs, I saw there is a difference between how the nodes startup locally with docker-compose and on kubernetes, which is the following lines:
locally I see the following lines in each node's output:
INFO [] Initialising Ethereum protocol name=istanbul versions="[99 64]" network=10 dbversion=7
...
DEBUG[] InProc registered namespace=istanbul
on kubernetes (either in minikube or AWS) I see these lines differently:
INFO [] Initialising Ethereum protocol name=eth versions="[64 63]" network=10 dbversion=7
...
DEBUG[] IPC registered namespace=eth
DEBUG[] IPC registered namespace=ethash
Why is this happening? What is the significance of name=istanbul/eth
? My common sense logic says that the error happens because of the use of name=eth
, instead of name=istanbul
. But I don't know the significance of this, and more importantly, I don't know what it is I did to inadvertently affect the kubernetes deployment.
Any ideas how to fix this?
EDIT
I tried to address what David Maze mentioned in his comment, i.e. that the node directory gets overwritten, so I created a new directory in the image with
RUN mkdir /opt/nodedata/
and used that to mount volumes in kubernetes. I also used StatefulSets instead of Deployments in kubernetes. The relevant yaml follows:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: qnode0
spec:
serviceName: qnode0
replicas: 1
selector:
matchLabels:
app: qnode0
template:
metadata:
labels:
app: qnode0
spec:
containers:
- image: <myDockerHub>/qnode0
imagePullPolicy: ""
name: qnode0
ports:
- protocol: TCP
containerPort: 22000
- protocol: TCP
containerPort: 30303
volumeMounts:
- mountPath: /opt/nodedata
name: qnode0-data
restartPolicy: Always
serviceAccountName: ""
volumes:
- name: qnode0-data
persistentVolumeClaim:
claimName: qnode0-data
Changing the volume mount immediately produced the correct behaviour of
INFO [] Initialising Ethereum protocol name=istanbul
However, I had networking issues, which I solved by using the environment variables that kubernetes sets for each service, which include the IP each service is running at, e.g.:
QNODE0_PORT_30303_TCP_ADDR=172.20.115.164
I also changed my kubernetes services a little, as follows:
apiVersion: v1
kind: Service
metadata:
labels:
app: qnode0
name: qnode0
spec:
ports:
- name: "22000"
port: 22000
targetPort: 22000
- name: "30303"
port: 30303
targetPort: 30303
selector:
app: qnode0
Using the environment variables to properly initialise the quorum files solved the networking problem.
However, when I delete my stateful sets and my services with:
kubectl delete -f <my_statefulset_and_service_yamls>
and then apply them again:
kubectl apply -f <my_statefulset_and_service_yamls>
quorum starts from scratch, i.e. it does not continue block creation from where it stopped but starts from 1 again, as follows:
Inserted new block number=1 hash=1c99d0…fe59bb
So the state of the blockchain is not saved, as was my initial fear. What should I do to address this?