Networked State Machines as First-Class Citizens

State machines are extensively used by the vast majority of games. It is near impossible to imagine implementing menu flows, or turn-based games without them.

Generally, these state machines come in the form of code. An enum to control the state, and logic to transition from one enum value to another. These are often simplified further by wrapping state transitions in macros, to denote which states can transfer to/from which other states. This is boilerplate that must be repeated again and again for each adhoc state machine in the codebase.

Programmers have repeated this pattern with heavy success time and time again. However, in doing so, we have made a grave mistake. We have settled for the 2nd best move. A better move exists.

To understand the problem, we need to understand the almighty Tick. Want to move to a new menu state? Simply check for input or an event in Tick, and update a global state variable. The state machine is a second-class citizen, subservient to the almighty Tick. Some developers try to “clean this up” by firing events from the eternal ether, but this often makes debugging worse. Breakpoints on transitions give nothing except an incoherent callstack that leads to some event handler code. Developers either choose to live in this hellscape, or come crawling back to the Tick.

The problem compounds even more when networking is involved, such as a two player versus card game. Now we have three Tick based state machines, two for each player, and one for the server. Again, these exist entirely in code, in which state advances based on the success or failure of network events. Generally, these transitions are behind reliable, TCP-based game server remote procedure calls (RPCs) that require the programmer to have some semblance of this state machine in their head.

Within Tick, we can possibly render our current state to some debug screen, often written in an immediate mode renderer such as imgui. As before, this must be implemented for each individual state machine in the codebase, completely adhoc. Depending on the development environment, there may not even be a server GUI, which means becoming a glorified crime scene analyst piecing together timestamps to track down elusive state transition bugs. We are missing a layer of abstraction that significantly speeds up development and iteration time of these state machines.

To put this succinctly, a second-class state machine is where code drives the state. A first-class state machine inverts this, where the state drives the code in an entirely data-driven manner, with editor integration for ease of authorship.

So what does a first-class state machine look like? Fortunately, there are some examples of this in the wild that are directionally correct. Unreal Engine extensively uses visual scripting for many of its components, but one in particular stands out. Skeletal animations are driven extensively by an animation graph, which is a visual state machine. The states light up for quick and easy visual debugging, and creating states/transitions is entirely authored via drag and drop. If there are many instances of that state machine, a specific instance can be selected for debugging. Breakpoints can even be set on specific states.

Unfortunately, this system is entirely constrained to animations. Unreal also features a visual scripting language called ‘Blueprint’, which is largely tick and event driven. This is no different than the Tick + manual tracking we outlined earlier.

Interestingly, somebody created a solution in the form of a plugin called Logic-Driver Pro. This plugin allows for executing state machines, meaning we can escape Blueprints, and operate entirely from within state machines. Instead of creating an enum for which menu state we are in, we simply create a menu state machine. Each new screen is just a new state. There is no enum. Its just a graph. At any given time, it is obvious what menu state we are in. When transitioning to each state, it becomes trivial to spawn the desired menu widgets.

Transitions can be unconditional, they can be input driven, or they can be event driven. The state machine can execute Blueprints within each state, rather than Blueprints directly driving state. This inversion of ownership is precisely what I mean when I say first-class.

The beauty of this is that states and transitions can be whatever the developer decides. You can have a generic “cutscene state” that automatically disables physics and loads a cutscene file. You can have a “script state” that executes a Lua or whatever script. You can have a “blueprint state” that executes a blueprint graph. Transitions are equally extensible. You can make an “RNG transition” that selects 1 of N transitions with weighted probabilities. You can make a “Dialogue option chosen transition” for when the user picks the nth dialogue choice. The possibilities are really quite endless, and really serve certain genres quite well.

While this is great, a true first-class state machine can go much further. What if you were designing a card game? In this scenario, we would want a state machine that travels across client and server boundaries. Again, if we were writing code, we would be manually tracking the state and advancing it via RPCs. If we built out a replicated, networked state machine, then state transitions can be marked as remote procedure calls. It becomes obvious when we are passing the privileged boundaries, it becomes clear when we are doing too much extra networking, and so on. If doing local development, it becomes very clear if state is “stuck” on the client or the server.

What is great is that this system generalizes incredibly well. For example, if we wanted to extend our networked state transitions to include backend API calls, this can be abstracted similarly. To give brief context, modern networked games are generally split into 3 distinct pieces. A game client, a game server, and a backend API. The client is the players PC, the server is transient game state (enemies and their health, or the current field state of a card game, etc.), and the backend API is for persistant game state (inventory, drops, achievements, etc.). While the client <-> server RPCs are tightly integrated, APIs are not part of the game engine. Therefore most codebases have to roll their own calls to backend APIs, often wrapping them with custom exponential backoff logic. This is another state machine in disguise, that developers often botch by either failing to handle retries, not gracefully handling a failed response, and so on. Encapsulating this within a state machine would have saved me a great deal of developer hours in my career.

Of course there are some complexities that arise from this. For example, a client cannot just ask the server to transition itself to any state for any state machine. There are some basic checks the server can do, ie “is the client skipping over any server-side states that it should have executed?”. And this does not eliminate the need to understand networking architectures, costs, and deciding whether code should be run on the client or server. Additionally, if a state transition fails or times out, and the UI speculatively updates, this needs to be handled as an explicit transition or code that runs in these failure scenarios.

Additionally, writing a state machine debugger that spans a client/server boundary can be tricky, even if both are local. This is an acceptable cost, the entire point is to push the complexity to a single system. This is not meant to be a replacement for all netcode, but instead just a drop in replacement for a very specific netcode use case. Whatever price we pay in development of this infrastructure is absolutely worth it for the increased game development iteration speed.

We’re currently working on a suite of game engines that are hyper-specialized to their domains. Visual novel, deck builder, point-and-click, etc., where we will be proving out this concept and tightly integrating it into our engines. This means when creating your visual novel, the dialogue trees are a series of “dialogue states” with “dialogue option chosen transitions”. When creating your card game, the phases of the turns are obvious. The “draw card state” flows naturally into the “play card state”. There are a plethora of genres that benefit greatly from a data-oriented approach that has been, until now, widely neglected.

If you wish to follow our progress, follow me at https://x.com/zcanann