Agentic Coding Isn't the Trap. Supervising From Your Head Is.

Lars Faye's "Agentic Coding is a Trap" is the most honest writing I've seen on AI skill atrophy. The studies he cites are real. The "supervision paradox" — needing the skills the agent erodes to oversee it — is the cleanest framing of the failure mode I've read. I want to push on the conclusion, not the diagnosis.

The Anthropic study Faye references — "How AI Assistance Impacts the Formation of Coding Skills" — found a 17% drop in skill mastery for developers using AI assistance, with debugging showing the steepest decline. That's the headline number. But the same study also found something that gets quoted less often. Developers who used AI for conceptual inquiry scored 65% or higher on the follow-up evaluation. Developers who delegated code generation to the model scored below 40%.

That gap — 65 versus 40, on the same tool and the same task — is the entire game.

What the same study actually shows

The variable that drove the difference wasn't whether the developer used the agent. It was how they supervised the work. The high-scoring group asked follow-up questions, combined generation with explanation, used the model for conceptual gaps and not code-shaped output. The low-scoring group accepted what the model produced and moved on. Same tool. Two completely different supervision patterns. Two completely different outcomes.

Faye treats the headline 17% as evidence the tool is the problem. The 65/40 split inside the same paper says the supervision pattern is the problem. Those are different conclusions, and they call for different fixes.

The trap is the supervision pattern

Faye's prescription is to demote the AI: write pseudo-code by hand, treat the model as a "Ship's Computer not Data," never delegate work you haven't done yourself. The implicit move is to relocate as much of the work back into the developer's head as possible, on the theory that the head is where supervision capacity has to live.

That theory is where I want to push.

The supervision paradox bites for one reason. The developer is being asked to be the entire supervisory apparatus, by themselves, in real time, using only working memory and personal vigilance. That fails. It fails the same way it fails for a senior engineer reviewing a 4,000-line PR from a junior at 4pm on a Friday. The bottleneck isn't the code. It's the cognitive substrate the reviewer is using.

Anything you don't exercise daily fades. If your supervision is "I personally read every line and hold the whole system in my head," then yes — once an agent writes more lines than you can read, you lose. Atrophy is the symptom. Personal vigilance as the supervision strategy is the part worth examining.

Move supervision out of your head

The fix that the 65% group implicitly used is not to type more code. It's to put supervision in places that don't atrophy.

That list is short and well-known:

  • Tests that fail when the contract breaks. Not coverage theater — real assertions on the edges that matter.
  • Types that refuse to compile when the shape is wrong. The compiler does not get tired at 4pm.
  • Lint and format rules that catch the patterns you keep correcting by hand. If you've corrected the same pattern twice, lint it.
  • Hooks at the runtime layer. Claude Code's PreToolUse and SessionStart hooks run deterministically — the model can't forget them. The set of rules that are regex-shaped and load-bearing belong here, not in a system prompt.
  • Code review as the final gate. Same discipline humans have used to supervise other humans' code for fifty years. It works on agent output for the same reason it worked on junior output: the reviewer doesn't need to have written the code, they need to be able to defend it.
  • Append-only mistake logs. The Mistakes Become Rules pattern — one numbered file, the agent reads it at session start, every correction becomes a permanent entry. The supervision lives in the file, not in the next reviewer's recall.

Each of these is institutional memory. None of them depends on a single developer holding the whole system in working memory. All of them survive the developer taking three weeks off.

The real test

Here is the question that separates the two groups in the Anthropic study, generalized.

Take three weeks off. An agent does the work in your absence, given only the repo, the tests, the lint, the hooks, the mistake log, and the review process. When you come back, is the codebase in a state you can defend?

If yes, supervision lives in artifacts. The agent is being supervised by the system you put in place, not by your personal vigilance. Atrophy of your typing speed is not a threat, because typing was never the supervision mechanism.

If no, the artifacts aren't there yet. Personal vigilance is the only thing standing between the codebase and chaos, and Faye's prescription is the right safety move for that situation. Demote the agent. Build the artifacts before you raise it back up.

Why "Ship's Computer, not Data" is too narrow

Faye's analogy locates judgment in one captain's head. That framing is the same shape as the paradox — supervision as a personal cognitive feat. It quietly assumes the developer is alone with the tool.

A different shape works better. The agent is a junior — fast, eager, occasionally confidently wrong, requires review. You are the senior. You don't supervise by re-typing the junior's work. You supervise by reading the diff, running the tests, checking it against the team's accumulated rules, and asking the junior to defend choices you don't understand. Anthropic's own Building Effective Agents framing assumes exactly this division of labor — the human owns the seams, the agent owns the steps between them. I made the same point about agency belonging at judgment seams when arguing against turning cron jobs into agents. The shape matches.

Senior engineers do not atrophy by not typing. They atrophy by not reviewing critically. That distinction is most of the game.

What Faye gets right that I'm not arguing with

Vendor lock-in is real. Token costs are unpredictable. Outages happen. Probabilistic systems require review cycles that deterministic ones don't. None of those go away in this reframe.

But they're risks to manage, not reasons to put supervision back in your head. You manage vendor risk with model-agnostic runtimes and the kind of prompts, skills, and hooks that move between models. You manage token cost with caching and tier discipline. You manage outages by having work that doesn't depend on a single API call to make progress. None of that is "type more code by hand."

The shorter version

Skill atrophy under heavy agent use is real, and Faye is right to take it seriously. The skill that atrophies fastest is "personal vigilance as a supervision strategy," and that strategy was under pressure at scale long before agents existed. Agents accelerate it.

The fix isn't only to demote the agent. It's also — and mostly — to promote the artifacts. Put the supervision in places that don't get tired, don't forget, and don't need to be re-derived from working memory every Tuesday morning. The 65% group in the Anthropic study were already doing this, even if the paper didn't name it that way.

The trap isn't agentic coding. The trap is treating supervision as a thing that lives inside one developer's head. Move it out, and the paradox eases.

You've successfully subscribed to The Cloud Codex
Great! Next, complete checkout to get full access to all premium content.
Error! Could not sign up. invalid link.
Welcome back! You've successfully signed in.
Error! Could not sign in. Please try again.
Success! Your account is fully activated, you now have access to all content.
Error! Stripe checkout failed.
Success! Your billing info is updated.
Error! Billing info update failed.