We are trying to mount lustre filesystem inside running container, and have successfully done this via containers which are running in priviledged mode.
However for those containers which are running in non-privilidged mode, mounting lustre failed, even if all capabilites linux provides -- tens of capabilities -- were included!
Then
- what is difference between "priviledged: True" and "cap_add: all capabilites"?
- Why mounting lustre still fails when all capabilities were added to the container?
Non-Privileged Mode Container:
version: "3"
services:
aiart:
cap_add:
- AUDIT_CONTROL
- AUDIT_READ
- AUDIT_WRITE
- BLOCK_SUSPEND
- CHOWN
- DAC_OVERRIDE
- DAC_READ_SEARCH
- FOWNER
- FSETID
- IPC_LOCK
- IPC_OWNER
- KILL
- LEASE
- LINUX_IMMUTABLE
- MAC_ADMIN
- MAC_OVERRIDE
- MKNOD
- NET_ADMIN
- NET_BIND_SERVICE
- NET_BROADCAST
- NET_RAW
- SETGID
- SETFCAP
- SETPCAP
- SETUID
- SYS_ADMIN
- SYS_BOOT
- SYS_CHROOT
- SYS_MODULE
- SYS_NICE
- SYS_PACCT
- SYS_PTRACE
- SYS_RAWIO
- SYS_RESOURCE
- SYS_TIME
- SYS_TTY_CONFIG
- SYSLOG
- WAKE_ALARM
image: test_lustre:1.1
#privileged: true
ports:
- "12345:12345"
volumes:
- /home/wallace/test-lustre/docker/lustre-client:/lustre/lustre-client
The difference with
--privileged
andall-capabilities
is, that--privileged
argument removes all limitations enforced by cgroup controller and disables security enchantments while providing access for all devices. Privileged container truly becomes part of the host operating system, and has access even into AppArmor and SELinux configurations, which might not be applied, such as SELinux labels.When
--privileged
flag is used, it does not enforce any extra security for underlying container, and kernel filesystem is not mounted as read-only into container. SECCOMP filtering is disabled as well. Still, you can't get more power than current namespace allows, for example if you are running rootless daemon.Capabilities are way to adjust the power of root, but still some security enchantments are applied when container is executed.
One great blog post by Red Hat is available in here.
As pointed out in other answer, AppArmor is probably the issue in this case, and by using
--security-opt apparmor:unconfined
flag when running container, mounting might be possible. However, that should be used only temporally.