Migrations You Can Prove

Every schema migration is a hope-and-pray script. You write it, you run it against a staging copy that is never quite production, and then one night it runs exactly once against the real data and you find out whether you were right. There is no second take. The rows it mangles are gone. The column it forgot to backfill is now a landmine some query will step on three weeks from now.

We have built an enormous amount of machinery to make this less terrifying, and a lot of it is genuinely good.

Protocol Buffers got something deeply right. Field numbers are stable identities. Rename user_id to account_id and the wire does not care, because the wire only ever knew it as field 4. Add a field, old readers skip it. Retire a field, reserve its number so nobody can reuse it and resurrect old garbage under a new name. That discipline, stable tags plus additive evolution plus reserved retirements, is why a service mesh can roll out one pod at a time without a coordinated big-bang deploy.

serde got something right too. Serialization stops being hand-written byte code and becomes a derived, mechanical fact about your types. The format and the struct are connected by generated code the compiler typechecks, so a field of the wrong type is a build error instead of a corrupt record.

And ORMs with migration tooling got something right. Generated migration scaffolds, up and down pairs, a version table recording what has run, autogenerated diffs from your model. That turned “edit the production database by hand at 2am” into “review a generated diff and apply it in order.” Real progress.

They all share one gap. The migration is code the compiler does not understand as a migration. Protobuf checks wire compatibility, but the function that turns a decoded v1 message into your live in-memory object is just a function. serde derives the codec, but the logic that reconciles an old saved blob with a new struct is just a function. The ORM generates a scaffold, a human fills the body with UPDATE statements, and nothing on Earth checks that those statements cover every row, preserve every invariant, and leave no field undefined. The tooling proves the shape lines up, not that the transformation is correct. You still find out in production.

Two problems that are secretly one problem

Most stacks treat “bytes on a socket” and “bytes on disk” as two unrelated fragile systems. You have a protobuf schema for the network, a separate ORM mapping for the database, and a third ad-hoc format for the cache, each rotting on its own schedule. Three formats, three migration stories, three chances to be wrong.

Omega refuses the split. The on-disk format is wire format. The cached blob is wire format. The RPC payload is wire format. One versioning discipline covers all of them, because they are the same problem: a stable external representation that has to survive across versions of your code.

wire data declares that external schema with explicit field numbers, the protobuf insight taken as a language primitive rather than a sidecar .proto file.

wire data CounterMessage {
    0: counter: i32;
    1: timestamp_millis: i64;

    reserved 3;
}

Field numbers are the contract. Order is not. Renaming is free. Reusing a retired number is illegal. So far this is protobuf, deliberately: Omega keeps the part that worked and stops treating it as a separate ecosystem with its own compiler.

The runtime shape can be completely different, optimized for execution instead of compatibility.

data Counter {
    counter: AtomicI32;
    timestamp: DateTime;
}

The bridge between them is an ordinary machine with a contract, and that machine is where the proof lives.

The migration becomes a proof obligation

A runtime data type can carry its historical shapes alongside its current one, and the compiler typechecks against each.

data Counter {
    version v1 {
        counter: i32;
    }

    counter: AtomicI32;
    timestamp: DateTime;
}

Counter is the current type. Counter::v1 is the old shape, a real type the compiler still knows. Old state is not pretended into the new shape, and new fields are not silently available on old state. Write a v1 machine and reach for self.timestamp and that is a compile error, because v1 never had a timestamp.

The migration from old to new is a machine carrying the obligations that turn it from a hopeful script into a checked transformation.

machine Counter::from_v1(
    old: Counter::v1,
    out: &mut Counter
)
satisfies RuntimeMigratable<Counter::v1, Counter>
effects
    alloc
requires
    exclusive(old)
ensures
    Counter::invariants(out)
{
    out.counter = AtomicI32::new(old.counter);
    out.timestamp = DateTime::now();
}

The ensures makes the migration establish Counter::invariants(out) before the build accepts it. A forgotten field becomes a compiler error rather than a latent bug, because out.timestamp is part of the current shape and leaving it unestablished fails the invariant obligation. The mapping has to be total: every reachable path produces a fully valid current Counter, or it does not compile. The effects alloc line says this migration allocates and nothing more, no surprise device touch, no blocking call hiding in a “simple” data fix. The requires exclusive(old) says nobody else holds a reference to the old state while you transform it, so the migration cannot race the very thing it is rewriting.

This is the same proof discipline Omega uses everywhere, pointed at the one operation that historically had no proofs at all. Values carry facts, operations carry contracts, contracts create obligations, and the compiler discharges them before the artifact ships. The migration is not exempt for being “just data plumbing.”

Migrations compose, the way you would hope.

v1 -> v2 -> current

Ask to bring v1 state forward to the current Counter and the compiler walks the known chain. If a link is missing, the operation is simply unavailable until you write it. No silent identity-mapping, no “it’ll probably be fine,” no version that quietly falls through to garbage.

Live migration as a designed operation

The real payoff is upgrading state that is alive: an OS that swaps a driver without rebooting, a long-running service that replaces a component while requests are in flight. The migration runs against a moving system instead of a quiet database, exactly when the hope-and-pray model is most lethal.

Omega makes the swap a controlled sequence with stated obligations:

old machine quiescent -> migrate state -> install new machine -> resume

Before that swap is allowed, the compiler and runtime have to establish the dangerous facts humans normally just assume. Quiescence: no core is currently executing inside the machine being replaced. Borrow safety: no outstanding reference points into state the migration will invalidate, and no queued callback, timer, or interrupt continuation can re-enter the old code after the swap. Invariant preservation: the new state satisfies the new contract. These are the borrow checker, the invariant checker, and the effect checker doing their ordinary jobs at a boundary that used to be a blind spot, not new checks bolted on for migrations.

When the swap cannot be proven safe, the build tells you why, in terms of the actual blockers:

machine Driver:
  replacement blocked:
    pending interrupt callback targets Driver::v1::complete_request
    outstanding borrow of DriverState::v1 held by scheduler queue

The legacy model finds out at 2am that an interrupt fired into freed code. Omega finds out at build time that an interrupt could, and refuses to ship the swap until you have fenced it.

None of this claims migration is free. You still write the transformation. You still decide what an old field maps to and whether now() is an honest default for a timestamp you never recorded. Omega does not invent the intent. It refuses to let the intent be incomplete, untotal, or invariant-breaking, and it does so for the network format, the disk format, and the live in-memory upgrade with one discipline instead of three.

A migration in Omega is a transformation the compiler made you finish before it would let you run it once.