How AI agents work: a control flow breakdown
Update — May 2026. Since this post, I’ve published the loop-inside-a-harness model as two npm packages:
marco-harness(the harness) andmarco-agent(a practical agent built on top). For the architecture walkthrough through one real turn, see MARCO: the loop inside a harness, in code. For the story behind both — and the mental model the build crystallized — see A mental model for Claude Code (and every other modern agent).
The standard AI agent explanation gives you a bag of parts:
- model
- tools
- memory
- reasoning
- human-in-the-loop
That list is fine as far as it goes. But it hides the most important thing: the control flow.
If I tell you a car has an engine, wheels, brakes, and a steering wheel, I have named the parts. I still have not explained how the car moves.
Most agent writing has the same problem. It names parts. It does not show the engine.
The clearest way to understand an AI agent is not as a bag of parts. It is as a loop inside a harness. The loop does the work. The harness controls when it runs, what it can do, and what happens with the result.
The inner loop
The inner loop is the core runtime — the part that actually does the work. It takes an input, calls the model, uses tools if needed, and keeps going until it has an answer.
A simple way to read that diagram is:
- build context
- call the model
- if the model asks for a tool, request execution from the harness
- harness checks permissions, runs it, and returns the result
- add the result back into context
- call the model again
- stop when there is a final answer
That is the engine.
Without the loop, a prompt can only answer from what is already in context. The loop is what lets the system reach beyond that — use a tool, see the result, update context, and keep going.
Inside the inner loop
That was the high-level view. Now let’s walk through the diagram.
Build Context is the starting point. It means gathering everything the model can see for this turn:
- conversation history
- retrieved information and memory
- available tool definitions
- previous tool results
- system instructions
If something is not in context, the model cannot use it.
Those inputs feed into Assemble Prompt — the step where the system combines them into the actual prompt that gets sent to the model.
Then the model is called. In the diagram, that node is Call LLM Token Generator — the model takes context in and generates the next tokens out.
After that comes the decision point: Need Tool Call? This is where reasoning lives — the model deciding whether to output a tool call or return a final answer. If no, the loop ends and returns a response. If yes, the inner loop sends a Request Tool Execution — it names the tool and the arguments, then hands the request to the harness. The dotted lines in the diagram show that boundary: the request crosses out of the loop, the result crosses back in. The inner loop does not execute tools itself. The harness owns that step — it checks permissions, runs the call if allowed, and returns the result. The loop sees only the outcome, feeds it back into context, and continues.
That covers the diagram. But remember memory from the bag of parts at the top? It lives inside this same loop.
Where memory fits
Memory is just data the loop can read and write through tools. Retrieved memory goes into context. New information gets saved back out through a tool call. Memory is not a special primitive. It is state — managed by the same loop.
That is the complete inner loop — context, tools, memory, reasoning, all running inside one repeating cycle. Before we move to the outer loop, one thing is worth calling out.
Better models do not change the loop
When people say a model is better, they mean the outputs improve — sharper reasoning, better decisions. That is true. Swap a weaker model for a stronger one and things get better. But the loop does not change.
The inner loop is the engine. It can do work. But it cannot decide when to start, what it is allowed to do, whether a human needs to approve something, or what to do with the result.
Those are structural questions. And they belong to the outer loop.
The outer loop
The outer loop — the harness — is everything that wraps around the inner loop to make it usable in the real world. It decides when the loop runs, what it is allowed to do, whether a human needs to sign off, and what happens with the result.
The key move in this diagram is that the inner loop is one node. The harness is everything around it.
What the outer-loop nodes do
The outer loop starts with a Trigger. Something starts it:
- a user message
- a schedule
- an event
- a webhook
Then comes Permissions & rate limits. The system may have rules outside the inner loop. Some runs are allowed. Some are blocked. If a run is blocked, the system may Reject / Notify instead of entering the loop at all.
If the run is allowed, the system enters the Inner Loop.
Every tool call the inner loop makes crosses back out to the harness — dotted line out. The harness looks at what kind of tool call it is and routes accordingly. If it is an action call, it checks whether auto-approved or needs sign-off — surfacing Ask Human permission if so. If it is a clarification call, there is nothing to approve — the harness surfaces Ask Human clarification directly, waits for the answer, and returns it as a tool result. Either way, the result crosses back in — dotted line in — and the loop continues.
When the loop reaches a final answer, the harness still has work to do. The system may Monitor + Log the result. It may ask whether Human review is required. If review is required, the output may be approved, rejected, retried, or escalated. If review is not required, the system can go straight to Deliver Output.
After delivery, one more decision: should the system Schedule next run? If yes, the outer loop can re-enter later. If not, the run is done.
You may have noticed the diagram has two different paths that involve a human. That is not an accident.
Human-in-the-loop shows up in two different ways
People say “human in the loop” like it is one thing. In practice, it shows up in two ways.
1. Permission gates
The harness pauses a tool call and asks for approval. For example:
- sending a message
- making a purchase
- editing production data
- running a risky command
That is not the model asking for missing knowledge. That is the surrounding system enforcing a rule.
2. Clarification requests
In most frameworks, clarification is also implemented as a tool call — the model outputs something like ask_human({ question: "..." }). The harness intercepts it, surfaces the question, and returns the answer as a tool result.
The mechanism is the same as any other tool call. But the harness handles it differently — no permission check, no execution. Just a question and a wait. The answer flows back into context, and the loop keeps going.
From the outside, both cases can look similar. A human got involved. But they are not the same mechanism.
Both route through the harness as tool calls. The difference is who initiates. Permission gates are enforced by the harness — the model has no say in which calls trigger them. Clarifications are initiated by the model — it decides it needs information and outputs the tool call.
If you are debugging or designing, that distinction changes where you look.
This kind of layer confusion is not limited to human-in-the-loop. It runs through the entire agent conversation.
Why the workflow vs. agent debate never dies
A lot of confusion about agents is really confusion about layers.
Anthropic makes an explicit distinction between workflows and agents:
“Workflows are systems where LLMs and tools are orchestrated through predefined code paths.”
“Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.”
The exact boundary is less important than the bigger point: people are collapsing different kinds of systems into one vague word.
That is why the debate never ends. People say “agent” and mean different things:
- a tool-using loop driven by a model
- a workflow with some LLM calls in it
- a production system with retries, approvals, and monitoring
- a multi-agent orchestration layer
Those are related ideas. But they are not the same layer.
If you do not separate the inner loop from the outer loop, the whole topic starts to blur together. Then you get the endless arguments:
- is this an agent or just a workflow?
- is memory part of the agent or part of the system?
- are approvals part of the agent?
- what makes something agentic at all?
The loop-inside-a-harness framing does not solve every naming debate. But it does make the system much easier to reason about.
If you want to see the inner loop in code rather than diagrams, there is a companion repo.
The repo
This article has a companion repo: HowAIAgentsWork . It focuses on the inner loop — a small, inspectable reference you can read and run. Writing can drift. Code is harder to bluff.
Closing
The next time someone explains an AI agent as a list of parts, ask one question: where is the loop?
That question cuts through most of the confusion. The loop is the engine. The harness is what makes it usable. Once you see that split, everything else falls into place.
That is the model: a loop inside a harness.