Docker Is a Workaround for a Broken OS

A container fakes two things the operating system should have handed you directly: a clean namespace and real isolation.

Docker deserves respect. Given the Linux it had to work with, it solved real problems extraordinarily well: packaging an application with its exact dependencies so it runs the same everywhere, isolating a workload from its neighbors on a shared host, making a deployment reproducible from a declarative file. These were genuine, painful, expensive problems, and containers made them tractable enough to rebuild the entire industry’s deployment story around them.

But look at how it is built, and the shape of the workaround becomes obvious.

What a container actually is

A Linux container is a carefully assembled stack of separate mechanisms, each invented to carve a private view out of a system where everything is global by default:

Namespaces (mount, pid, net, user, and friends) give the process a private view of the filesystem tree, the process table, the network, user ids. There are several of them because there are several global things to hide.
cgroups cap how much CPU, memory, and I/O the process can consume, because by default it draws on the whole machine.
A union filesystem (overlayfs and its kin) layers a writable scratch space over read-only image layers, so the container can edit “the filesystem” without touching the base.
seccomp, plus AppArmor or SELinux, filters which system calls the process may make, because by default it can make all of them.

Every one of those exists to subtract. The process starts able to see the global mount tree, the global process list, the global network, the whole syscall surface, the whole machine’s resources. Namespaces hide the tree, cgroups cap the resources, seccomp trims the syscalls, union mounts fake a private root. The container is the negative space left after you claw all of that back.

It works. It is also a lot of distinct machinery, none of which the kernel offered as a first-class idea, all composed into what we now call an image and a runtime. It had to be assembled because Linux has one global root: a single /, against which every process resolves names. So “give this program its own world” means building scaffolding to lie to it about the one world that exists.

Cathedral has no global root to hide

A container has to be built because the default is shared. Cathedral inverts the default.

There is no single global tree. The store is partitioned into realms: independent rooted namespaces, each reachable only by holding a capability to its root. System is one realm, user another, each app its own. You can name what is inside a realm only if you hold the capability to its root. There is no universal / it hangs beneath.

Resolution is relative to a root you hold, not to a global path. Each principal carries a small private resolution environment: a table binding realm names like system: and user: to actual root nodes. The name is shared vocabulary; the binding is private and per-principal. The host sets that table when it spawns the program, choosing what each name points at.

That one indirection is the whole trick. The host user’s system: binds to the real system root. A sandboxed app’s system: can bind to an overlay, a filtered view, or a wholly synthetic root. The same string, system://..., resolves to different objects for different programs. An unmodified app that hardcodes a system path is sandboxed by binding its system: somewhere else, the way a container rebinds /, except there is no namespace mechanism to set up. The binding was always the host’s to choose, because the host owns the tree.

Everything Docker subtracts, Cathedral hands you as a primitive

A realm root is just an interface: resolve a name, list children, read or write an object. The program only ever talks to the root capability it holds, with no global root to escape to. Whatever sits behind that root is the program’s entire reality, and it cannot tell what kind of root it was given. That is where the container’s bag of tricks turns into single primitives.

Attenuated. The real system:// root, narrowed to read-only. The program sees the truth and cannot write it.
A view. A root that passes reads through to a real realm but hides or substitutes subtrees. Directories are already views, so a masking root is nothing new.
A copy-on-write overlay. Reads fall through to a base realm; writes land in a private per-program layer. The program edits “system files” while really writing its own overlay, the base untouched. Because object bodies are content-addressed, the base is shared with no copy. That is overlayfs for free, from the storage model rather than a separate filesystem driver.
Fully synthetic. The root is backed by a provider that fabricates the whole tree: a made-up system://, a made-up user:// full of apps and files that do not exist. The program believes it is running on a complete OS.

The rest of the container stack collapses the same way. “What processes are running” is already capability-scoped, so a process listing inside the sandbox shows a fabricated or empty set, no pid namespace required. The network is flow authorizations to specific peers, so a private or synthetic network is a provider answering them. Resource limits are budgets, which is what cgroups were approximating. Syscall filtering is the effect ceiling, declared rather than assembled out of seccomp rules.

So a “container” in Cathedral is an app handed synthetic or attenuated roots plus a scoped capability set. No container runtime, no image format. The sandbox is the default you get by not handing over the real roots, and it nests: a sandboxed program can hand its own children further-narrowed roots, because the same machinery applies one level down.

Take it to the limit. An app can be handed not just a synthetic root but the authority over that realm: the power to mint capabilities, grant and revoke them, spawn privileged children, carve sub-realms. Then it is genuinely root of its own world, running a miniature Cathedral, sandboxing its own children exactly as it was sandboxed. One rule keeps that safe: a realm-authority can only mint capabilities over objects within its own realm. The app can mint endlessly, and every minted capability resolves to a synthetic object inside its realm, never a real one. Sovereign inside, contained outside, because the host owns the realm that the app’s realm lives in.

“It works on my machine” dissolves

“It works on my machine” is a namespace failure. The program ran against a global tree that happened to have the right library at the right path on the developer’s box and a different one in production. The shared global state was an implicit dependency nobody declared.

When a program’s namespace is constructed and scoped instead of inherited, that failure mode goes away. The program resolves names against the root it was handed, and that root contains exactly what was put behind it. There is no ambient /usr/lib to differ between machines, because there is no ambient anything. Reproducibility stops being something you achieve by shipping a tarball of an entire userland and becomes a property of how naming works.

Docker made the namespace constructible, brilliantly, on a system whose default is a shared global root, which forced the construction into an external apparatus: an image, a runtime, a daemon, a registry. Cathedral makes the namespace constructed by default, so the container stops being a separate concept bolted on top of the OS. A program’s relationship to the filesystem already is the sandbox.

The honest limit

One thing containers do that realms do not get for free: run a binary built for a different operating system. A Linux executable wants /proc, fork, signals, mmap, and a pile of ioctls; fabricating a convincing system:// does nothing for it. It wants Linux’s surfaces, not Cathedral’s, and giving it those means emulating the Linux ABI, a real compatibility layer and real work. The realm hands you the namespacing and isolation for nothing. The foreign syscall surface is a separate job.

But this is the much smaller problem. Docker spends enormous effort on namespace and isolation and still runs only Linux binaries, because it is a Linux feature. Cathedral gets namespace and isolation as a side effect of having no global root, and pays the ABI cost only when it actually wants to run something foreign.

The mental model

A container is the answer to “how do I give a program its own world on a system where there is only one world.” Cathedral’s answer is to never have built the one world in the first place: a program’s reality is just the roots and capabilities you handed it, so the container was the OS all along.