Copying a Terabyte Should Be Instant

You select a folder, you hit copy, and a progress bar settles into the familiar lie of “about 4 minutes.”

The bytes are already on the disk. You are not asking the machine to create new information, you are asking it to make a second thing identical to a first thing. The progress bar exists because the operating system does not know those two things are the same, so it dutifully reads every byte and writes it back down a few tracks over.

That ignorance is the bug. Fix the ignorance and the progress bar disappears.

What “copy” actually means

What is a file’s name? On a traditional filesystem, a name points at an inode, the inode points at a list of disk blocks, and so the name is bound to a location: “this file lives at these sectors.” With that model, the only honest way to copy is to allocate new sectors and stream the bytes across, because a copy means a second set of blocks and a second set of blocks has to be filled.

Now make a file’s name point at its content, identified by the hash of that content, instead of a location.

A hash is a short fingerprint computed from bytes. Run it over a file and you get a 256-bit value that, for any practical purpose, no other file’s bytes will ever produce. Cathedral uses BLAKE3, a modern cryptographic hash that is fast, parallel, and built out of an internal tree so you can verify a slice of a huge file without rehashing the whole thing. The hash is the file’s address, not metadata beside it. Content lives in one big store keyed by hash, and a name is just a pointer into it: report.pdf -> blake3:9f86d0....

Now a copy is two names referring to the same bytes. The bytes already exist and already have an address, so a copy is one more pointer to the same hash. No bytes move. A four-kilobyte file and a four-hundred-gigabyte file cost the same, because you are not touching the content at all, only adding a reference. Copying is O(1), independent of size.

Copy a folder of a million files and the same logic holds one level up. A directory is also just content: the list of name -> hash entries for its children. So a directory has a hash too, and that hash transitively names everything beneath it. Copying the whole tree is one new reference to the existing root, not a walk that recreates a million nodes.

That is what “copying a terabyte should be instant” means: a data model where the copy was never work, not a faster disk.

The structure underneath: a Merkle DAG

If files and directories are both named by the hash of their content, and a directory’s content includes its children’s hashes, then the whole filesystem is a tree of hashes where each node’s identity is computed from its children’s. That structure is a Merkle tree, or here a Merkle DAG (directed acyclic graph), because the same node can have more than one parent.

That “more than one parent” is what makes copies free and keeps them safe.

When two folders contain an identical file, they do not each carry their own copy. They point at the same content node, because identical bytes produce an identical hash and an identical hash is the same address. Sharing is the default. The question becomes “how would two identical things ever fail to be the same object,” and the answer is they cannot.

What happens when you edit one of those shared copies? Nodes are immutable. A content node, once written, is never modified in place, because changing its bytes would change its hash and make it a different node. When you edit your file you produce new bytes, those new bytes have a new hash, and the system writes a new node and points your name at it. The old node is untouched, and the other folder pointing at it never notices. Your edit walks from the change up to the root, writing a fresh node at each step along that one path and leaving every other subtree shared. This is copy-on-write, at a cost proportional to the depth of the tree, not its size.

A bonus falls out of building identity from children’s hashes: the graph cannot contain a cycle. A node’s hash depends on its children’s hashes, so pointing back at an ancestor would require knowing a hash before computing it. Impossible by construction. Traversal always terminates and never needs cycle detection, which is more than symlinks can promise.

Snapshots and integrity were already paid for

Once content identity is the primitive, two features people normally buy separately fall out of the design.

A snapshot is a copy of a realm’s root, which is a single reference. Freeze the root hash and you have frozen the entire state of that subtree at that instant, immutably, for the cost of recording one hash. You can keep a snapshot every hour for a year and pay only for the bytes that changed between them, because everything unchanged is still the same shared nodes.

Integrity is the better story. On a normal filesystem a checksum is an extra thing you compute, store beside the data, and hope someone checks. Here the address is the checksum, a strong cryptographic one. Read content back, rehash the bytes, and confirm they produce the address you asked for. If a disk has silently rotted a bit, the bytes no longer hash to their own name, and the corruption is caught on read instead of served to you as if it were fine. This is integrity by construction, a property you cannot turn off, because asking for content by its hash and getting different content back is a contradiction the system detects immediately.

And because a directory’s hash certifies its children and the root’s hash certifies the directories, a single root hash certifies that every byte beneath it is exactly what it was. Tampering anywhere changes a hash somewhere, and that change propagates to the top. One number vouches for the entire tree. That is the property that lets Git tell you a repository has not been altered, applied to a whole live filesystem.

The honest part: this is not a new idea, and it is not free

Content-addressed storage was not invented here, and the systems that pioneered it deserve real credit.

ZFS and btrfs already do most of what people imagine is exotic about this. Copy-on-write. Cheap clones and snapshots that share unchanged blocks. Checksums that catch bit rot and, with redundancy, repair it. A NAS running ZFS gives you instant snapshots and self-healing storage today, and that is excellent engineering that predates anything I am building. Git has been a content-addressed Merkle DAG of your source code for two decades, which is exactly why a commit hash can vouch for an entire tree of files. Nix builds whole packages addressed by the hash of their inputs. None of this is science fiction. It ships.

So what is the actual delta?

In ZFS and btrfs, content identity is an implementation technique under a conventional filesystem. The clone is cheap but a special operation you invoke; the checksum is real but bookkeeping beside the block, not the block’s name; dedup is optional and famously expensive to turn on. The byte-level POSIX namespace on top still thinks in files at locations. Content addressing stays in the basement instead of shaping the architecture.

Cathedral makes content identity the primitive the entire namespace is built on, all the way up. A name is defined as a reference to a hash; there is no other kind of name. Copy is what writing a second reference trivially is, not a special clone operation. Dedup is the consequence of identical bytes being the same object, not a mode you enable. Snapshot is freezing a root, not a subsystem. The same substrate then reaches past the filesystem into packages, updates, and boot integrity, because once everything is addressed by content, those problems turn out to be the same problem. That is a separate essay, and the reason this matters: the win is collapsing a dozen features into one idea.

It costs something real. If content identity is the primitive, the system must maintain a map from every hash to where its bytes live, the hash -> location index. That index is the dedup table, the exact thing that gave ZFS deduplication its terrible reputation: turn it on naively and the table grows until it no longer fits in RAM, and performance falls off a cliff. Making dedup the identity means the index is mandatory, so it has to stay bounded by design: indexed at object granularity rather than per-block, kept on a fast flash device as a log-structured tree with bloom filters so a lookup is usually a RAM hit and at worst one flash read, and partitioned per realm so only the parts you are using need to be resident. The worst case, random access across more unique content than fits in the cache, is genuinely the hard case this model creates, and pretending otherwise would be dishonest.

That is the trade. You pay for an index, carefully, once. In return, copying stops being work, dedup and snapshots stop being features, and integrity stops being optional.

The mental model is small enough to hold in one hand: a name is a pointer to content identified by its hash, so copying is repointing, and the hash that addresses the bytes is the same hash that certifies them.