The Cleanest Pattern I've Found for Autonomous Agents: A Terminal Tool With No execute

Why ToolLoopAgent + a no-execute finalAnswer tool replaced 80 lines of brittle while-loop code in my research agent.

Why ToolLoopAgent + a no-execute finalAnswer tool replaced 80 lines of brittle while-loop code in my research agent.

Every blog post I read about "autonomous LLM agents" in 2024 had roughly the same shape:

let messages = initialMessages;
for (let i = 0; i < MAX_STEPS; i++) {
  const result = await streamText({ model, messages, tools });
  messages = [...messages, ...result.responseMessages];

  if (result.text.includes('FINAL ANSWER')) break;
  if (result.finishReason === 'stop' && !result.toolCalls.length) break;
  // ...maybe other heuristics...
}

const finalText = parseFinalAnswer(messages);

This works. I shipped variants of it for over a year. But every time I came back to one of these loops six months later to fix something, I would find myself re-deriving:

When does this loop actually stop?
How does the parser distinguish "I'm done" from "I'm thinking out loud"?
What happens if the model hallucinates the magic phrase mid-thought?
Why is my final-answer extraction a regex over the last assistant message?

Then in early 2026 I built a research agent on top of Vercel AI SDK v6's ToolLoopAgent and a tiny pattern clicked into place that made all of those questions go away.

The pattern is: define your "I'm done" signal as a tool with no execute function.

That sentence sounds boring. Stay with me — it's the cleanest agentic-loop trick I've seen.

🔗 github.com/NeryC/research-agent · live demo

The agent I built

The use case was a familiar one: an LLM that takes a research question, searches the web, reads pages, and synthesizes a cited Markdown answer. Three tools:

searchWeb({ query }) — calls Exa's /search endpoint, returns 5 results with title, URL, and pre-extracted highlights.
readPage({ url }) — calls Exa's /contents, returns up to 8,000 characters of clean page text.
finalAnswer({ answer, sources }) — emits the final response to the user, with citations.

The model decides which tool to call, in what order, and when to stop. It can search, read a page, search again with refined terms, read another page, and finally produce its answer. Up to 8 steps total.

The pattern: a terminal tool that doesn't execute

In AI SDK v6, every tool is defined with tool() and a Zod input schema. Most tools also have an execute function that actually does the work:

export const searchWebTool = tool({
  description: 'Search the web for relevant pages on a topic.',
  inputSchema: z.object({
    query: z.string().describe('The search query'),
  }),
  execute: async ({ query }) => {
    const results = await searchExa(query, 5);
    return results.map(r => ({ title: r.title, url: r.url, highlights: r.highlights }));
  },
});

Now look at finalAnswer:

export const finalAnswerTool = tool({
  description: 'Submit the final answer to the user with structured citations.',
  inputSchema: z.object({
    answer: z.string(),
    sources: z.array(z.object({
      url: z.url(),
      title: z.string(),
      snippet: z.string().optional(),
    })),
  }),
  // ← intentionally no execute
});

No execute. Just a schema.

Inside ToolLoopAgent, the SDK sees that this tool has no executor. When the model decides to call it, the loop stops immediately. The tool's call is the result. The validated, fully typed payload ({ answer, sources }) is what gets returned to your code.

export const researchAgent = new ToolLoopAgent({
  model: claude-sonnet-4.6,
  instructions: SYSTEM_PROMPT,
  tools: {
    searchWeb: searchWebTool,
    readPage: readPageTool,
    finalAnswer: finalAnswerTool,
  },
  stopWhen: stepCountIs(8),
});

That's the whole agent.

No while. No for. No "did the model say it's done?" parsing. No regex over the response text. The model gets to decide between "search," "read," and "finish" using the same mechanism it uses for the other tools, and the loop terminates the moment it picks "finish."

Why this is better than the old pattern

Three reasons it stuck for me:

1. Termination is structural, not heuristic

In the old pattern, "when do we stop?" is decided by what the model says. You might match on the literal string "FINAL ANSWER" or check that the model's last message has no tool calls. Both are heuristics. They fail when the model rephrases, when it thinks out loud, when it does both — call a tool and output some prose that looks like a final answer.

In the no-execute pattern, "when do we stop?" is decided by which function the model calls. There is no string to parse. The decision lives in the model's tool-call output, which is a structured field, not free text.

2. The output is fully typed

When the loop stops, you get the finalAnswer tool's input — already parsed, already validated against the Zod schema. There is no separate "extract the final answer" step.

const result = await researchAgent.stream({ messages });
// result.finalToolCall.args is typed as { answer: string, sources: Source[] }

Compare to the old pattern, where you'd have to scan messages for the tag, slice out the JSON, parse it, and handle the case where the model wrote prose instead of JSON.

3. The agent loop is declarative

ToolLoopAgent({ model, tools, stopWhen }) reads as a complete description of what the agent does. There's no hidden loop body, no scattered exit conditions, no edge-case handling spread across helper functions. If I come back to this code a year from now, I'll know exactly what it does in five seconds.

What `stepCountIs(8)` means

stopWhen: stepCountIs(8) is the safety net. If the model never calls finalAnswer (e.g. it gets stuck in a search loop), the SDK will halt the agent after 8 tool-call iterations. This caps both compute cost and the risk of infinite loops.

For the research agent, 8 steps is plenty. A typical session:

Step 1: searchWeb({ query: "Vercel AI Gateway vs LiteLLM" })   → 5 results
Step 2: readPage({ url: "https://vercel.com/.../ai-gateway" }) → 8000 chars
Step 3: searchWeb({ query: "LiteLLM features pricing 2024" })  → 5 results
Step 4: readPage({ url: "https://docs.litellm.ai/..." })       → 8000 chars
Step 5: finalAnswer({ answer: "...", sources: [...] })         → loop terminates

If the question is harder, the model uses more steps. If it bottoms out, the cap saves you.

A few practical extras

Streaming the agent's reasoning to the browser

ToolLoopAgent.stream() returns a UI Message Stream that AI SDK v6's useChat hook consumes natively:

// Server side — Route Handler
export async function POST(req: Request) {
  const { messages }: { messages: UIMessage[] } = await req.json();

  const result = await researchAgent.stream({
    messages: await convertToModelMessages(messages),
  });

  return result.toUIMessageStreamResponse();
}

// Client side
const { messages, sendMessage } = useChat({
  transport: new DefaultChatTransport({ api: '/api/research' }),
});

Each tool call streams to the browser as it happens — searchWeb start → searchWeb result → readPage start → … — and the UI renders a card for each one in real time. No polling, no manual SSE plumbing on the client; the hook handles it.

Vercel AI Gateway over the direct Anthropic SDK

I route the model call through Vercel AI Gateway instead of using the Anthropic SDK directly. Three reasons:

One API key for everything. I can swap from claude-sonnet-4.6 to claude-haiku-4.5 to gpt-5 by changing a string. No new key management.
Automatic provider failover. If Anthropic has an outage, the Gateway re-routes to Bedrock or Vertex without me touching anything.
Cost dashboard. Every request is logged with cost, latency, and token usage. Useful when you're trying to figure out which agent is burning your budget.

The model constant lives in lib/agent/model.ts:

export const RESEARCH_MODEL = 'anthropic/claude-sonnet-4.6';

That's it. The Gateway picks it up from the AI_GATEWAY_API_KEY env var.

Why Exa instead of Google/Bing

I don't strictly need a search engine — I could parse Google's HTML or use a SerpAPI-style proxy. But Exa returns clean text content directly, no raw HTML, no anti-bot resistance, no JavaScript-heavy pages. For an LLM that needs to read content, this is a huge time-saver. The whole lib/exa.ts file is ~50 lines.

Rate limiting per IP

The agent makes a Sonnet call per step plus an Exa call per searchWeb/readPage. That gets expensive fast on a public demo. I use a simple in-memory Map<string, Bucket> rate limiter keyed by IP, capped at 5 queries per IP per hour. For a portfolio demo this is sufficient; in production you'd swap to Upstash Redis so the limit is global across instances.

export function rateLimit(key: string, opts: { max: number; windowMs: number }) {
  const now = Date.now();
  const existing = buckets.get(key);
  if (!existing || existing.resetAt <= now) {
    buckets.set(key, { count: 1, resetAt: now + opts.windowMs });
    return { allowed: true, remaining: opts.max - 1 };
  }
  if (existing.count >= opts.max) return { allowed: false };
  existing.count += 1;
  return { allowed: true };
}

What I learned

Three takeaways I'll carry into every future agent:

Make termination structural. If your loop's exit condition is "the model said the magic phrase," you have a brittle agent. Make it a tool call instead. The model still gets to decide; you get a clean signal and a typed payload.
Lean on ToolLoopAgent (or the equivalent in your framework) instead of hand-rolling loops. A declarative agent definition is easier to read, easier to debug, and easier to hand off to a teammate. The bespoke while loop is almost never worth its cost.
Schemas everywhere. Same lesson as my multi-agent code reviewer post: the schema is the contract. Once you internalize that LLM outputs should be validated typed data, not free-form text, every agent you build gets cleaner.

The whole research agent — UI, server route, agent definition, three tools, rate limiter, 14 tests — fits in roughly 700 lines of TypeScript. The README walks through every file. Steal it for your own project.

🔗 github.com/NeryC/research-agent
🔗 Live demo: research-agent-three-pi.vercel.app

The Cleanest Pattern I've Found for Autonomous Agents: A Terminal Tool With No execute

The agent I built

The pattern: a terminal tool that doesn't execute

Why this is better than the old pattern

1. Termination is structural, not heuristic

2. The output is fully typed

3. The agent loop is declarative

What `stepCountIs(8)` means

A few practical extras

Streaming the agent's reasoning to the browser

Vercel AI Gateway over the direct Anthropic SDK

Why Exa instead of Google/Bing

Rate limiting per IP

What I learned

About this project

About the author

The Cleanest Pattern I've Found for Autonomous Agents: A Terminal Tool With No execute

The agent I built

The pattern: a terminal tool that doesn't execute

Why this is better than the old pattern

1. Termination is structural, not heuristic

2. The output is fully typed

3. The agent loop is declarative

What stepCountIs(8) means

A few practical extras

Streaming the agent's reasoning to the browser

Vercel AI Gateway over the direct Anthropic SDK

Why Exa instead of Google/Bing

Rate limiting per IP

What I learned

About this project

About the author

What `stepCountIs(8)` means