Not everything called an agent is one

Everyone is building agents. At least that is what the README says.

I have spent the last few months working with these systems at different levels of complexity, and one thing became clear early on: the word agent is being used to describe things that are fundamentally different from each other. A system prompt with a role. A process that calls tools. A loop that runs on its own at 3am and takes action without anyone asking it to. All called agents.

It matters to have the right vocabulary, not for pedantic reasons, but because if you do not know what you are building, you cannot reason about its failure modes.

Here is how I think about it.

LLM with role or context

This is the starting point. You give the model a system prompt that defines who it is, what it knows, and how it should respond. It might have deep domain knowledge: your architecture guidelines, your color palette, your coding conventions. It will give much better answers than a vanilla LLM because of that context.

But it does not act. You send a message, it responds, that is the end. There is no loop, no decision about what to do next. It is a very well-configured function call.

A lot of what gets called an “AI agent” in product demos is this. A chatbot with a detailed system prompt. Useful, but not an agent.

LLM with tools

Now it can act on the world. You give it functions it can call: read a file, query a database, hit an API. The model decides which tool to use and when. There is a loop, but a shallow one: reason, act, done.

This is more powerful, but it is still fragile when used autonomously. If something unexpected happens, it has no way to adapt. It will follow its instructions until it either finishes or gets stuck. Think of it as a script that can make decisions, but only within a narrow path.

Agent

This is where the word starts to earn its meaning. An agent has an objective, a set of tools, and a genuine reasoning loop: observe the situation, decide what to do, act, observe the result, decide again. It adapts to what it finds. If the first approach does not work, it tries another.

The key difference from the previous level is not the tools. It is the loop and the capacity to handle the unexpected.

An agent can also exist on two very different points of an autonomy axis. A reactive agent waits for you to invoke it: you hand it a problem, it works through it, it gives you a result. A semi-autonomous or autonomous agent has its own trigger: a cron job, a webhook, an event. It runs without anyone asking it to. It acts, and you find out later.

Both are agents. The autonomy axis is separate from the capability axis.

Multi-agent

An orchestrator agent that invokes specialized sub-processes to handle parts of a larger task. The orchestrator owns the goal and the decision of what to do next. The sub-processes are specialists: they do one thing well within a limited scope.

The sub-processes can themselves be any of the above. An orchestrator might invoke a plain LLM with deep domain context for validation (it does not need to act, it needs to know things), and a proper agent with tools for execution. The orchestrator does not care what they are underneath, only what they return.

This is what makes multi-agent systems powerful and also what makes them hard to debug: the behavior emerges from the interaction between components, not from any single one.

Why it matters

The failure modes are completely different at each level. An LLM with bad context gives bad answers. An agent with a broken reasoning loop gets stuck or goes off track. An autonomous agent fails silently, and by the time you notice, it has already acted on a bad decision several times.

Most of the “agent” disasters I have read about are autonomous agents that were actually LLMs with tools: no real reasoning loop, but real-world side effects.

This is also where evals become critical. If your system acts autonomously, you need a way to measure whether its decisions are sound before something goes wrong in production. That deserves its own post.

Know what you are building before you give it a cron job.