Namespaces & cgroups: Containers Guide

Docker, podman, Kubernetes — all containers are built on two Linux kernel features: namespaces (isolate what a process can see) and cgroups (limit how much it can use). There’s no “container” abstraction in the kernel. A container is just a normal process with a specific combination of namespaces and cgroups applied.

Namespaces: isolation

A namespace makes a process see its own private version of a system resource. Linux has eight kinds:

Namespace	Isolates
PID	Process IDs (your container sees PID 1 = its main process)
Mount	Filesystem mounts (your container has its own / )
Network	Network interfaces, routes, sockets
UTS	Hostname and domain name
IPC	Shared memory, semaphores
User	UIDs and GIDs (root in container ≠ root on host)
cgroup	Hides cgroup hierarchy from container
Time	CLOCK_BOOTTIME and CLOCK_MONOTONIC offsets

See your namespaces

# Your shell's namespaces
ls -la /proc/$$/ns

# Compare with another process
sudo ls -la /proc/1/ns        # PID 1 (init) namespaces

# If the inode numbers differ, you're in different namespaces.

Create a namespace by hand

# PID + mount + UTS namespace; runs bash inside
sudo unshare --pid --fork --mount --uts bash

# Inside, run:
hostname container-test           # only changes inside namespace
ps -ef                            # different processes visible? (need to mount /proc)
mount -t proc proc /proc          # now ps shows only THIS namespace
ps -ef                            # very few processes!
exit

You just made (most of) a container.

cgroups: resource limits

cgroups (control groups) limit and account for resources: CPU, memory, disk I/O, network. Combined with namespaces, you get isolated processes that can’t hog the machine.

cgroup v2 (modern, default on most distros)

The cgroup tree lives at /sys/fs/cgroup/. Each subdirectory is a control group; child processes are listed in cgroup.procs.

# See all current cgroups
ls /sys/fs/cgroup/

# What cgroup is THIS process in?
cat /proc/self/cgroup
# 0::/user.slice/user-1000.slice/user@1000.service/...

Create a cgroup and limit memory

sudo mkdir /sys/fs/cgroup/myapp
echo "100M" | sudo tee /sys/fs/cgroup/myapp/memory.max

# Move a running process into it
echo $$ | sudo tee /sys/fs/cgroup/myapp/cgroup.procs

# Now this shell — and anything it runs — is capped at 100MB

Limit CPU

# Allow 50% of one CPU (50000 microseconds out of every 100000)
echo "50000 100000" | sudo tee /sys/fs/cgroup/myapp/cpu.max

The systemd way

systemd integrates cgroups deeply. Setting limits in unit files:

[Service]
MemoryMax=512M
CPUQuota=25%
TasksMax=50

Or one-shot via systemd-run:

sudo systemd-run --slice=myapp.slice -p MemoryMax=100M -p CPUQuota=50% sleep 1000

systemd-cgtop                      # live view of cgroup resource usage

Putting it together: a minimal container

Modern container runtimes (runc, crun) basically do this:

Create namespaces (PID, mount, network, UTS, IPC, user).
Set up a root filesystem (chroot/pivot_root to an image).
Apply cgroups to limit resources.
Drop capabilities (limit privileged syscalls).
Apply seccomp/AppArmor/SELinux profiles for further restriction.
Exec the container’s command.

Docker is a friendly wrapper around all of this.

Inspect a Docker container’s namespaces

docker run -d --name mynginx nginx
PID=$(docker inspect -f '{{.State.Pid}}' mynginx)
ls -la /proc/$PID/ns

# Compare with host
ls -la /proc/1/ns

# Inode numbers differ — different namespaces

Why this knowledge matters

Debug “container can’t see X” issues — usually a namespace mismatch.
Understand security boundaries — root inside a container is NOT root on the host (with user namespaces).
Resource issues — “container is OOM-killed” maps directly to memory.max in its cgroup.
Build minimal containers — once you know what’s actually happening, you can craft tighter setups.

Useful tools

lsns                   # list all namespaces
nsenter                # enter another process's namespaces
unshare                 # create new namespaces
systemd-cgtop          # live cgroup resource usage
systemd-cgls           # cgroup tree

Common mistakes

Treating containers like VMs — they share a kernel with the host. A kernel exploit escapes the container.
Running containers as root user inside, without user namespaces — that root has more host access than expected.
Forgetting cgroup memory limits in production — one runaway container OOM-kills the whole host.

What to learn next

Now that the kernel mechanics are clear, Docker — the most common way people actually use these features — is the next and final stop on the roadmap.

Namespaces and cgroups: How Containers Actually Work

Namespaces: isolation

See your namespaces

Create a namespace by hand

cgroups: resource limits

cgroup v2 (modern, default on most distros)

Create a cgroup and limit memory

Limit CPU

The systemd way

Putting it together: a minimal container

Inspect a Docker container’s namespaces

Why this knowledge matters

Useful tools

Common mistakes

What to learn next

Firewalls on Linux: ufw and nftables

A 9-Year-Old Linux Kernel Bug Just Let Attackers Steal SSH Keys and Get Root on Every Major Distro

Docker Basics: The Daily Commands

What is Linux? A Practical Overview

SELinux and AppArmor: Mandatory Access Control

Ubuntu: The Most Popular Linux Distribution

Leave a Reply Cancel reply

Namespaces: isolation

See your namespaces

Create a namespace by hand

cgroups: resource limits

cgroup v2 (modern, default on most distros)

Create a cgroup and limit memory

Limit CPU

The systemd way

Putting it together: a minimal container

Inspect a Docker container’s namespaces

Why this knowledge matters

Useful tools

Common mistakes

What to learn next

Similar Posts

Leave a Reply Cancel reply