To test if a file in a emptyDir Volume is synchronized between containers I used tail to observe the same file in two containers, and I stumbled upon the following behavior:
Pod definition:
apiVersion: v1
kind: Pod
metadata:
name: fortune
spec:
containers:
- image: luksa/fortune
name: html-generator
volumeMounts:
- name: html
mountPath: /var/htdocs
- image: nginx:alpine
name: web-server
volumeMounts:
- name: html
mountPath: /usr/share/nginx/html
readOnly: true
ports:
- containerPort: 80
protocol: TCP
volumes:
- name: html
emptyDir: {}
Example was taken from the book Kubernetes in Action by Marko Luksa. The luksa/fortune
image just writes a fortune text to the file /var/htdocs/index.html
inside the html-generator
container. Every 10 seconds a new file is written in which the content is the output of fortune
.
Tailing the same file in both containers outputs sometimes an incomplete response by the web-server
container.
Part of the html-generator
container output:
kubectl exec -c html-generator -it fortune -- tail -f /var/htdocs/index.html
The very ink with which all history is written is merely fluid prejudice.
-- Mark Twain
Part of the web-server
container output
kubectl exec -c web-server -it fortune -- tail -f /usr/share/nginx/html/index.html
h all history is written is merely fluid prejudice.
-- Mark Twain
Question: is this caused by
- tail
- slow IO speed of the node disk
- Kubernetes volume sync logic
- something else?
PS.: I also noted that cURLing the web-service pod port while the index.html is being written to causes nginx to return an empty response body.
The issue with incomplete output from the container is caused by
nginx alpine
used in pod definition. When you change the image fromnginx:alpine
tonginx
the issue disappears because of different tail binaries used in those images.Kubernetes volumes sync seems unlikely to cause the issue- as written in
emptyDir
documentationThe partition created by emptyDir is ephemeral and applications cannot expect any performance SLAs (Disk IOPS for example) from this partition so "2. slow IO speed of the node disk" can also cause such issue but based on reproduction and changing image (what seemed to solve the issue) might be excluded.