Running a docker container without root, with podman

I was able to run a container, without root privileges, in a way which I think may work okay with NFS/GPFS (batch) in specific cases. I do not claim we should do this, but I just wanted to document the steps.

This is more or less what I did. The host was centos 7 (release 7.7)

As a system admin:
0. yum install -y slirp4netns podman

  1. run newuidmap or echo “$USER:$(uid*10000):10000”
    • Assumptions - Say I’m uid 10234
    • I want 10k possible UIDs
    • we we are reasonably certain the total number of UIDs active is less than 1 million
      • I think you don’t want to clash with your existing uid space
    • echo "bvan:102340000:10000" >> /etc/subuid
  2. run neguidmap or echo “$USER:$(gid*10000):10000”
    • Similar assumptions to (1) - although many systems have much more gids than uids
  3. reboot (might be another way of picking up /etc/subuid and /etc/subgid changes)
  4. echo 16000 > /proc/sys/user/max_user_namespaces
    • this does not persist across reboots - need root to execute this.
    • I (think) this is effectively your maximum numbers of user for the system as well

Running

# NOTE: MUST RUN AS ROOT IN CONTAINER!
# security-opt disables some selinux stuff which went crazy on a system I tested (https://github.com/containers/libpod/issues/3683)
$ podman run -it --rm -v /home/$USER:/root/backup  --security-opt label=disable docker.io/centos:7 /bin/bash 

Notes:

  • I have no idea what the proper handling/management of /etc/subuid and /etc/subgid. I’m not sure what
  • If you switch to another uid, you will start as the first value in the /etc/subuid - I basically started it at uid*10000, and you get up to 10k uids in a container.
  • If the container image contains a file with uid > max(uid) (which is the third column) - podman will balk at trying to run the container. The 10k seems fine for many containers
  • If you chose 65536, you could must have less than 65536 users/allocated UIDs in your system (65536*65536)

Here is an output of a terminal to show some of that in action.

[bvan@podman-test ~]$ cat /proc/sys/user/max_user_namespaces 
16000
[bvan@podman-test ~]$ cat /etc/subuid 
bvan:111260000:10000
[bvan@podman-test ~]$ cat /etc/subgid 
bvan:10890000:10000
[bvan@podman-test ~]$ podman run -it --rm -v /home/$USER:/root/backup  --security-opt label=disable docker.io/centos:7 /bin/bash
### INSIDE OF POD
[root@75f3d5708409 /]# cd /root/backup/
[root@75f3d5708409 backup]# ls -al
total 90
drwx------. 10 root root  4096 Mar 12 23:41 .
dr-xr-x---.  3 root root  4096 Mar 12 23:43 ..
-rw-------.  1 root root 20530 Mar 12 23:42 .bash_history
-rw-------.  1 root root    18 Jun 21  2018 .bash_logout
-rw-------.  1 root root   193 Jun 21  2018 .bash_profile
-rw-------.  1 root root   231 Jun 21  2018 .bashrc
drwxr-xr-x.  3 root root  4096 Feb 22 00:17 .cache
drwxr-xr-x.  4 root root  4096 Feb 22 00:17 .config
drwx------.  3 root root  4096 Feb 26 16:15 .emacs.d
-rw-------.  1 root root    56 Feb 24 14:30 .lesshst
drwx------.  3 root root  4096 Feb  5 00:34 .local
drwxr-----.  3 root root  4096 Jun 21  2018 .pki
drwx------.  2 root root  4096 Feb 22 01:05 .ssh
-rw-------.  1 root root   658 Jun 21  2018 .zshrc
-rw-r--r--.  1 root root     4 Mar 12 23:41 bar
[root@75f3d5708409 backup]# rm bar
rm: remove regular file 'bar'? y
[root@75f3d5708409 backup]# echo "hi how are you" > hello.txt
[root@75f3d5708409 backup]# ls -al 
total 90
drwx------. 10 root root  4096 Mar 12 23:43 .
dr-xr-x---.  3 root root  4096 Mar 12 23:43 ..
-rw-------.  1 root root 20530 Mar 12 23:42 .bash_history
-rw-------.  1 root root    18 Jun 21  2018 .bash_logout
-rw-------.  1 root root   193 Jun 21  2018 .bash_profile
-rw-------.  1 root root   231 Jun 21  2018 .bashrc
drwxr-xr-x.  3 root root  4096 Feb 22 00:17 .cache
drwxr-xr-x.  4 root root  4096 Feb 22 00:17 .config
drwx------.  3 root root  4096 Feb 26 16:15 .emacs.d
-rw-------.  1 root root    56 Feb 24 14:30 .lesshst
drwx------.  3 root root  4096 Feb  5 00:34 .local
drwxr-----.  3 root root  4096 Jun 21  2018 .pki
drwx------.  2 root root  4096 Feb 22 01:05 .ssh
-rw-------.  1 root root   658 Jun 21  2018 .zshrc
-rw-r--r--.  1 root root    15 Mar 12 23:43 hello.txt
[root@75f3d5708409 backup]# exit
### OUTSIDE OF POD
[bvan@podman-test ~]$ ls -al hello.txt 
-rw-r--r--. 1 bvan gl 15 Mar 12 16:43 hello.txt

Just to reiterate - as soon as you switch away from root inside of the container, you jump to the minimum UID in the user namespace UID allotment. This will store files on-disk as that uid, instead of your uid. This is fine for local files, but it not work for NFS/GPFS.

See also:

@brianv0 is this addressing a problem specific to running Docker images directly (i.e. using docker run)? In our Kubernetes deployments that write to GPFS volumes, we are typically running the container as an unprivileged user ID using the securityContext option in the Pod spec:

...
      containers:
      - name: {{ container_name }}
        securityContext:
          runAsUser: 1001
          runAsGroup: 1001
        image: {{ image }}
        command: {{ command }}
...

The resulting files for one such deployment look like:

$ ls -latr /des002/deslabs/desapps/namespaces/desdev2/deslabs-legacy/tasks/query/jtest/
total 96
drwxr-xr-x 4 root root  512 Mar 12 10:16 ..
-rw-r--r-- 1 1001 1001  969 Mar 12 10:16 query-4bb3c2a28eaf42589943cd8321b9a051-jtest.log
-rw-r--r-- 1 1001 1001  966 Mar 12 10:22 query-d9f6f1bf0da848b7b6271a4b2ef30506-jtest.log
-rw-r--r-- 1 1001 1001  966 Mar 12 10:32 query-0912bce74b20449c847b1ad2fe96a667-jtest.log
...

@andrew.manning - Yes exactly, this is more related that that specific use case - giving a user direct access to a command which can run containers, which historically is access to the docker command, which is very privileged. This is more applicable on shared systems like our verification cluster (slurm) or other shared developer resources (lsst-devXX). This is an alternative to using something like Singularity - a more secure one at that.

It may have some use in kubernetes, but I’m not sure what that integration looks like today, and I wouldn’t recommend trying to mix this with GPFS/NFS and kubernetes.securityContext prevents containers from switching uid’s when running, which can make it difficult for some containers to run, but this is becoming less of a problem.