A mental model for Claude Code (and every other modern agent)

5/4/2026•5 min read

I use Claude Code every day. So do millions of others. Gemini CLI, Codex — same story. They feel magical to use, and that magic was exactly what bothered me — I had real gaps in my understanding of how any of them actually worked. What does Claude Code do that the agents I built two years ago didn’t? What changed?

The answer turned out to be small. Every modern agent is a loop wrapped in a harness. But I didn’t see it until I built one myself.

I went looking first. Tutorials show one piece at a time. Frameworks bury the loop under layers of abstraction. Nobody had a clean picture of how the whole thing fits together — the kind you can hold in your head. The kind that lets you sketch a new agent on a napkin, or debug one at 2am.

I’d built agents before. The basic loop wasn’t new to me:

Start with a user message.
Build the context.
Call the model.
If the model asks for a tool, run it.
Feed the result back into the context.
Repeat until the model is done.

I’ve written that loop more than once. So when I read about Claude Code or Gemini CLI, the loop was easy to spot. What I couldn’t see was the rest. The part that made them different.

The gap

I wrote the first article to force myself to have an answer. The mental model was right there in that piece — an agent is a loop inside a harness. The diagrams showed it. The words said it. But the code I shipped along with that article only had half: the loop with tools. The harness was on paper, not in code.

Modern agents do more than just call tools. They:

Ask the user before running a sensitive tool
Shrink old context when the conversation gets long
Save state between turns so things don’t reset
Route between different models for different jobs
Recover when a step fails

That extra work doesn’t live inside the loop. It lives around it.

That outer layer is the harness. The loop runs the model. The harness governs the loop. Once that split was clear in my head, every modern agent looked the same — Claude Code, Gemini CLI, Codex, the open-source ones. A loop in the middle. A harness around it. A clean handoff between them. Once you see the split, the code stops looking strange.

Here’s what one turn actually looks like. The harness can step in at five different moments:

At the start of a turn. The harness opens the run — checks who the user is, loads their session, decides if the request is even allowed.
Before the model is called. The harness can change what the model is about to see — shrink the history if it’s too long, add a fresh instruction, hide secrets.
Before a tool runs. If the model wants to call a tool, the harness gets to decide first — does the user need to approve? Is the budget exceeded? Should this go somewhere else?
After a tool returns. Before the result goes back into the loop, the harness sees it first — clean it up, log it, save what’s worth remembering.
At the end of a turn. The harness gets the last word — save the conversation, send metrics, schedule whatever comes next.

Five moments. Five places to step in. That’s the whole vocabulary.

Clear in my head, but still not in code.

The build

So I built MARCO. The smallest readable thing I could write that makes the loop-and-harness split clear in code. Five hooks for the five moments above. Under a thousand lines of TypeScript.

I unpacked the architecture in a separate post — what each of the five hooks does, with state snapshots between every step of one real turn, plus the bugs that showed where the abstractions were doing real work. If you want to see the mental model play out in a working system, start there.

MARCO is on npm as marco-harness. Small on purpose, but not a toy — it’s the layer everything else here runs on.

Then I built marco-agent on top of the harness. It’s the agent I use in my own projects. Same architecture, plus everything you actually reach for once you start shipping:

Streaming responses
Connecting to outside tools (via MCP)
Multi-turn conversations that don’t forget
Auto-shrinking long histories
Spending limits per turn

Those aren’t features I invented. Every modern agent ends up needing them. Claude Code, Gemini CLI, Codex — they’ve all landed on the same set. marco-agent is the simplest version I could write. It sits on top of the harness, not beside it.

The mental model

The loop generates intentions. The harness gives them consequences. Together, they’re the agent.

The loop is the easy part — the model does the heavy lifting inside it, and that’s always been true. The harness is where the rest of the work lives. Seeing the split — not in diagrams but in working code — is what finally made the model click for me.

That’s what I want to give you. A mental model small enough to hold in your head — the kind that lets you read Claude Code’s source and recognize the pattern, debug your own agent at 2am, or sketch a new one on a napkin.

If you want to build with it: marco-harness is the harness, small enough to read in an afternoon. marco-agent is the practical version, built on top. Both MIT. Both on npm.

Read the code . Build with it. If you’d shape it differently, I want to know — I’m sure I’m missing things.

References

How AI agents work: a control-flow breakdown — the first article. The mental model laid out in diagrams.
MARCO: the loop inside a harness, in code — the architecture unpacked through one real turn, with state snapshots between every step and the bugs that proved the abstractions were doing real work.
marco-harness on npm — the harness package.
marco-agent on npm — the practical agent built on top.
pyrotank41/MARCO on GitHub — source code.