How to Run an Agent Loop Without Burning Your Token Budget

There’s a diagram making the rounds that splits the world into “prompt engineering” and “loop engineering.” The pitch: stop writing prompts one at a time and let the agent drive. Set a goal, fire a trigger, let the agent act, check whether the goal is met, and repeat until it is.

The pattern is real. An autonomous agent running its own act-check-repeat cycle is the core loop behind every agentic coding tool, including the one I use every day. The shape isn’t new either — the reason-act-observe loop was formalized back in 2022. What the picture skips is the two decisions that decide whether a loop earns its keep or quietly drains your account.

A loop is cheap to start and expensive to run. The moment you hand an agent the keys, you trade a problem you understand — writing the next prompt — for two that are harder: framing the goal so the agent can actually reach it, and defining the exit so it stops before the cost outruns the value. Get those wrong and you haven’t automated your work. You’ve built a slot machine and pointed it at your API bill.

The question is the hard part, not the loop

Look at that flowchart again and find the diamond labeled “Goal met?” That one box does all the work, and it’s drawn as a trivial yes/no. It isn’t. The entire economics of the loop live there.

A goal an agent can check is a goal an agent can finish. “Make every test in e2e/ pass without editing the test files” has a built-in pass/fail. The agent runs the suite, reads the result, and knows where it stands. “Make the dashboard better” has nothing. There’s no signal that says done, so the agent either spins forever or declares victory on a target it invented.

So the skill that replaces prompt-writing isn’t loop design. It’s writing a goal with a test attached. Before you start a loop, answer one question: how will the agent know it succeeded, without me reading the output? If the only judge is your own eyes, you don’t have a loop. You have an assistant that never learned to stop.

The cheap, trustworthy judges are the ones you already own, such as a test suite, a type checker, a linter, a schema validator, or a diff against expected output. The expensive judges are the ones that cost as much as the work itself — another model grading the first model, or you. When checking the answer is as hard as producing it, a loop adds cost and removes nothing.

A loop needs three limits, not one

The diagram has one exit: goal met. Real loops need three, because “goal met” is the exit that might never fire.

An iteration cap. A hard ceiling on turns. Not because you expect to hit it, but because the runs that blow up your bill are the ones that never converge, and a cap is the only thing standing between “didn’t converge” and “didn’t stop.” Pick a number — ten, twenty — and treat hitting it as a failure to investigate, not a budget to spend.

A cost ceiling. Count tokens, not just turns. An agent that re-reads a 100,000-token context on every pass has spent two million input tokens over twenty iterations before doing any real work. Set a budget in tokens or dollars and stop when you reach it, the same way you’d put a timeout on a network call you don’t fully trust.

A no-progress detector. This is the one people skip, and it’s the one that saves the most. Track the success signal across iterations. If the last three passes didn’t move it — same test count failing, same lint errors — the agent is stuck, and ten more turns won’t help. Stop on stall, not just on success or on the cap.

When a loop is worth it

Loops are worth it when verification is cheaper than the work and the task genuinely needs iteration. Fixing a failing test suite fits: running the tests is fast and certain, and the work is real trial and error. Migrating a few hundred files to a new API fits, if you can check each one mechanically.

Loops are waste when the task is one-shot or has no honest pass/fail. Looping a model on “write a good post” doesn’t produce a good post. It produces however many drafts your cap allows, and then you read all of them anyway — which is the human review you were trying to skip, now multiplied by the iteration count.

This is the same lesson as enterprise AI more broadly. The models were rarely the reason projects failed — the discipline around them was. An agent loop concentrates that truth into a single design decision and then bills you for getting it wrong.

What this looks like in practice

Before I start an agent loop, I write down three things: the goal, the check that proves the goal, and the three limits. If I can’t name the check, I stop and fix the goal first, because a loop without a check isn’t ready to run. If the check costs as much as the task, I don’t loop at all — I do it once and review it myself.

None of this is exotic. It’s the same discipline you’d apply to any process that runs without a human watching it: define done, bound the cost, detect when it’s stuck. The agent is new. The engineering isn’t.