An agent runtime is five primitives. Most fake at least three.
Memory, time perception, commitment tracking, typed action, control plane — the missing infrastructure beneath the model. Most agent frameworks glue substitutes together and call it done.
When teams talk about “the agent” they usually mean three things: the model, the prompt, and the tools. Pick a good model, write a careful prompt, expose a useful tool catalogue, and the demo is impressive. So is the second demo. By the third, the cracks start showing — not in any of those three, but underneath them, in the infrastructure nobody chose because nobody noticed they needed it.
That infrastructure is the agent runtime. It is not the model. It is not the tool layer. It is the layer between them that decides what the agent remembers across sessions, what it notices about the passage of time, what it has promised to come back to, how its actions are bounded and audited, and what supervises the whole loop. Most agent frameworks ship one or two of these and fake the rest with the prompt, the context window, or a tools.exec() call that takes a string.
There are five primitives. They’re separable on purpose.
flowchart LR
Model([LLM model])
Tools([Tool layer])
subgraph Runtime[Agent runtime]
Memory[Memory
Mnemos]
Time[Time perception
Chronos]
Commit[Commitment
Nous]
Action[Typed action
Praxis]
Control[Control plane
Olymp]
end
Model --> Runtime
Runtime --> Tools
Memory -. shared state .- Time
Time -. signals .- Commit
Commit -. plans .- Action
Control -. supervises .- Memory
Control -. supervises .- Time
Control -. supervises .- Commit
Control -. supervises .- Action
The five primitives.
- Memory. What the agent knows, who said it, when, and what the agent should do when two memories disagree. Not retrieval — that’s how you query memory. Memory is the structure being queried. Events, claims, contradictions, evidence, replay.
- Time perception. Pattern recognition across time-series. The agent that noticed a metric drifted for three weeks before it spiked. Recurrence, trend, spike, drop, stall, anomaly, seasonality, correlation. Without it, the agent treats every observation as standalone.
- Commitment tracking. Extracting promises from natural language — “I’ll get back to you Tuesday”, “we’ll close the loop next week”, “let me check with finance” — evaluating the risk that each promise is forgotten, intervening when it is. Without it, agents fall into the same trap humans do: the easy commitment becomes the dropped one.
- Typed action. Named, schema-validated, policy-checked, audit-logged actions. Not “the agent emits a function call and we shell out.” The action layer is where the agent meets the world; it should be the most rigorously typed part of the runtime, not a string-eval. Without it, actions are vibes.
- Control plane. The supervisor: which agents are running, what they’re doing, what they’re allowed to do, what budget they’ve burned, what state they’re in, who can pause or override them. Without it, the runtime is whatever the orchestrator script remembered to handle.
You don’t compose these in a single library. They’re orthogonal concerns that compose in a runtime. A small enough agent might run without three of them. A production agent will eventually need all five, will discover that need during an incident, and will spend the next quarter retrofitting them.
What goes wrong when you fake them.
A partial list of failure modes I keep watching:
-
Memory-as-buffer. The agent’s “memory” is the last 50k tokens of conversation history, summarized when it overflows. The summary is generated by the same model that needs to use it, on the same call. Nothing is structured, nothing is contradicted, nothing replays. The agent re-learns the user’s preferences every week.
-
Time-blind agent. Every observation is treated as fresh. The agent doesn’t notice the third refund request from the same user in two weeks. It doesn’t notice the metric is back-sliding on a 14-day cycle. The model can technically reason about time when explicitly given timestamps; the runtime never gives them.
-
Promise amnesia. User says “circle back Thursday.” Agent says “of course.” Thursday comes. Agent has no record that Thursday was a checkpoint. The commitment lived in the conversation, the conversation rolled out of context, the trust quietly degrades.
-
Prose-as-API. The action layer is
tools.exec({"name": "send_email", "args": "..."})where args is a JSON-encoded string. There’s no schema. There’s no policy check. The “send_email” tool happily ships an email because the model produced a string that parsed. The next incident reveals the agent was sending emails on the wrong account for a week. -
The unsupervised loop. Two agents working on overlapping pieces. Each thinks it owns the migration. The control plane is
goroutine.Go(func() { agent.Run(...) }). There is no supervisor, no shared state, no cancellation propagation. The first incident is when both agents commit conflicting changes; the postmortem is a ghost story. -
Fake control plane via dashboards. The team built a Grafana dashboard showing “active agents”. The dashboard reads logs. The logs are best-effort. The dashboard says everything is fine; production says otherwise; the gap is the missing primitive.
-
Audit log = chat transcript. The “audit” of what the agent did is the conversation log. The actual tool calls are interleaved with reasoning, model output, prompt fragments. The compliance team is asked “what did the agent change last quarter” and answers with a search query and a sigh.
-
Memory written by agent A, used by agent B, with no provenance. Multi-agent setup. Memory store is shared. Agent A wrote a “fact” three weeks ago that’s now load-bearing for Agent B’s planning. Nobody knows where the fact came from. Asking the original agent doesn’t help; the model that produced it has been updated twice since.
The pattern is the same in all eight: a primitive that should exist as a separate, observable, audit-able piece of infrastructure has been collapsed into the prompt, the conversation, or the orchestrator script. The collapse is convenient. The collapse is also the reason agent runtimes don’t survive scale.
The fix is to keep the primitives separate.
You don’t need my libraries. You need to recognize the primitives, decide which ones your runtime actually needs, and pick implementations that keep them legible.
I’ve been building each piece as a domain-agnostic Go library so they compose without locking into one runtime:
- Mnemos — memory & evidence layer
- Chronos — time & pattern perception
- Nous — commitment extraction & intervention
- Praxis — typed, policy-checked, audit-logged actions
- Olymp — runtime / control plane
Each one is replaceable. The thesis is the composition: an agent runtime isn’t one thing, and pretending it is means the first three failures will look like model problems, framework problems, or “we need a better prompt.” They aren’t. They’re missing primitives.
If you’re starting from one of the eight failure modes above and don’t know which primitive to adopt first, the mapping is reliable:
- Memory-as-buffer or memory written by A used by B → start with the evidence layer (Mnemos). It’s the most load-bearing primitive and the easiest to retrofit later.
- The unsupervised loop or fake control plane via dashboards → start with the control plane (Olymp). Once two agents share a runtime, the supervisor is non-negotiable.
- Prose-as-API or audit log = chat transcript → start with typed actions (Praxis). The action layer is where the agent meets the world; it should be the most rigorously typed part.
- Time-blind agent or repeated misses on patterns the data shows → start with time perception (Chronos).
- Promise amnesia or dropped follow-throughs → start with commitment tracking (Nous).
Pick the one your incident report points at. Add the next within the quarter.