An agent runtime is five primitives. Most fake at least three.

Memory, time perception, commitment tracking, typed action, control plane — the missing infrastructure beneath the model. Most agent frameworks glue substitutes together and call it done.

Cover art for "An agent runtime is five primitives. Most fake at least three."

When teams talk about “the agent” they usually mean three things: the model, the prompt, and the tools. Pick a good model, write a careful prompt, expose a useful tool catalogue, and the demo is impressive. So is the second demo. By the third, the cracks start showing — not in any of those three, but underneath them, in the infrastructure nobody chose because nobody noticed they needed it.

That infrastructure is the agent runtime. It is not the model. It is not the tool layer. It is the layer between them that decides what the agent remembers across sessions, what it notices about the passage of time, what it has promised to come back to, how its actions are bounded and audited, and what supervises the whole loop. Most agent frameworks ship one or two of these and fake the rest with the prompt, the context window, or a tools.exec() call that takes a string.

There are five primitives. They’re separable on purpose.

flowchart LR
  Model([LLM model])
  Tools([Tool layer])
  subgraph Runtime[Agent runtime]
    Memory[Memory
Mnemos] Time[Time perception
Chronos] Commit[Commitment
Nous] Action[Typed action
Praxis] Control[Control plane
Olymp] end Model --> Runtime Runtime --> Tools Memory -. shared state .- Time Time -. signals .- Commit Commit -. plans .- Action Control -. supervises .- Memory Control -. supervises .- Time Control -. supervises .- Commit Control -. supervises .- Action

The five primitives.

You don’t compose these in a single library. They’re orthogonal concerns that compose in a runtime. A small enough agent might run without three of them. A production agent will eventually need all five, will discover that need during an incident, and will spend the next quarter retrofitting them.

What goes wrong when you fake them.

A partial list of failure modes I keep watching:

  1. Memory-as-buffer. The agent’s “memory” is the last 50k tokens of conversation history, summarized when it overflows. The summary is generated by the same model that needs to use it, on the same call. Nothing is structured, nothing is contradicted, nothing replays. The agent re-learns the user’s preferences every week.

  2. Time-blind agent. Every observation is treated as fresh. The agent doesn’t notice the third refund request from the same user in two weeks. It doesn’t notice the metric is back-sliding on a 14-day cycle. The model can technically reason about time when explicitly given timestamps; the runtime never gives them.

  3. Promise amnesia. User says “circle back Thursday.” Agent says “of course.” Thursday comes. Agent has no record that Thursday was a checkpoint. The commitment lived in the conversation, the conversation rolled out of context, the trust quietly degrades.

  4. Prose-as-API. The action layer is tools.exec({"name": "send_email", "args": "..."}) where args is a JSON-encoded string. There’s no schema. There’s no policy check. The “send_email” tool happily ships an email because the model produced a string that parsed. The next incident reveals the agent was sending emails on the wrong account for a week.

  5. The unsupervised loop. Two agents working on overlapping pieces. Each thinks it owns the migration. The control plane is goroutine.Go(func() { agent.Run(...) }). There is no supervisor, no shared state, no cancellation propagation. The first incident is when both agents commit conflicting changes; the postmortem is a ghost story.

  6. Fake control plane via dashboards. The team built a Grafana dashboard showing “active agents”. The dashboard reads logs. The logs are best-effort. The dashboard says everything is fine; production says otherwise; the gap is the missing primitive.

  7. Audit log = chat transcript. The “audit” of what the agent did is the conversation log. The actual tool calls are interleaved with reasoning, model output, prompt fragments. The compliance team is asked “what did the agent change last quarter” and answers with a search query and a sigh.

  8. Memory written by agent A, used by agent B, with no provenance. Multi-agent setup. Memory store is shared. Agent A wrote a “fact” three weeks ago that’s now load-bearing for Agent B’s planning. Nobody knows where the fact came from. Asking the original agent doesn’t help; the model that produced it has been updated twice since.

The pattern is the same in all eight: a primitive that should exist as a separate, observable, audit-able piece of infrastructure has been collapsed into the prompt, the conversation, or the orchestrator script. The collapse is convenient. The collapse is also the reason agent runtimes don’t survive scale.

The fix is to keep the primitives separate.

You don’t need my libraries. You need to recognize the primitives, decide which ones your runtime actually needs, and pick implementations that keep them legible.

I’ve been building each piece as a domain-agnostic Go library so they compose without locking into one runtime:

Each one is replaceable. The thesis is the composition: an agent runtime isn’t one thing, and pretending it is means the first three failures will look like model problems, framework problems, or “we need a better prompt.” They aren’t. They’re missing primitives.

If you’re starting from one of the eight failure modes above and don’t know which primitive to adopt first, the mapping is reliable:

Pick the one your incident report points at. Add the next within the quarter.