I was asked this question in an interview and i m not sure of the correct answer hence I would like your suggestions.
I was asked whether we should persist production critical data inside of the docker instance or outside of it? What would be my choice and the reasons for it.
Would your answer differ incase we have a non-prod non critical data ?
Back your answers with reasons.
Most data should be managed externally to containers and container images. I tend to view data constrained to a container as temporary (intermediate|discardable) data. Otherwise, if it's being captured but it's not important to my business, why create it?
The name "container" is misleading. Containers aren't like VMs where there's a strong barrier (isolation) between VMs. When you run multiple containers on a single host, you can enumerate all their processes using
ps aux
on the host.There are good arguments for maintaining separation between processes and data and running both within a single container makes it more challenging to retain this separation.
Unlike processes, files in container layers are more isolated though. Although the layers are manifest as files on the host OS, you can't simply
ls
a container layer's files from the host OS. This makes accessing the data in a container more complex. There's also a performance penalty for effectively running a file system atop another file system.While it's common and trivial to move container images between machines (viz
docker push
anddocker pull
), it's less easy to move containers between machines. This isn't generally a problem for moving processes as these (config aside) are stateless and easy to move and recreate, but your data is state and you want to be able to move this data easily (for backups, recovery) and increasingly to move amongst a dynamic pool of nodes that perform processing upon it.Less importantly but not unimportantly, it's relatively easy to perform the equivalent of a
rm -rf *
with Docker by removing containers (docker container rm ...
) and thereby deleting the application and your data.