Stop Turning Your Cron Jobs Into Agents

The current message from engineering leadership at most companies I talk to is some version of: "find the deterministic automation in your stack and make it agentic." A recent r/devops thread captured the frustration: an SRE asked how to push back on a director who wanted every Airflow DAG converted into an agent loop because "agents are the future."

This is mostly bad advice. Not because agents are bad — they're great when the problem actually needs them — but because most existing automation does not need them and gets worse when retrofitted. Cron, Airflow, Step Functions, plain bash scripts: deterministic, idempotent, debuggable, free at the margin. Replacing them with an LLM call buys you variance you did not have, costs you tokens you did not spend, and produces logs you have to read instead of grep.

I run an agentic system as my daily driver. NEXUS has more than thirty scheduled processes — content scanning, finance sync, DeFi monitoring, Polymarket fair-value estimation, podcast digests, calendar audits. Of those, exactly three involve an agent in the loop. The rest are bash + cron + SQLite + a handful of LaunchAgents. They run silently, log structured output, and have not surprised me in months. The agents I do run are at the judgment seams — not the plumbing.

Here is the test I apply when someone wants to agentify something.

1. Does the input space exceed what you can pre-enumerate?

If the inputs are a known list — accounts to sync, files in a directory, customer records to enrich — you do not need an agent. You need a loop. Agents earn their keep when the input space is open: arbitrary user prompts, novel documents, situations the original author did not anticipate. If you can write the input down as an array, write a loop. If you cannot, an agent might be warranted.

This is the cleanest filter. Most "agentify our pipelines" pitches fail it on the first question.

2. Does the output require judgment, not pattern-matching?

A regex extracting amounts from invoices is pattern-matching. An LLM call interpreting a customer email and routing it to the right team can be pattern-matching too — but a fine-tuned classifier or a vector search will do it cheaper, faster, and with calibrated confidence. Agents earn their keep when the output requires reasoning across context the model has to pull together at run time. "Read this PR, find the architectural risk, and explain it to a junior engineer" is judgment. "Detect the language of this string" is not.

Anthropic's own framing draws the same line, between workflows (LLM calls orchestrated through predefined paths) and agents (LLMs deciding their own tool use and control flow). Most of what teams call "agents" is actually a workflow with a vibes-based orchestrator. Workflows are fine. They are also cheaper to operate, easier to test, and dramatically less likely to surprise you in production.

3. Will a human review every run before it commits?

If yes, you can be more permissive about agent variance. The human is the safety net. NEXUS's content pipeline is exactly this — Claude drafts a LinkedIn post, the post lands in Slack, I approve or reject before anything goes external. The variance is fine because I'm in the loop.

If no — if the system runs unattended, at scale, and acts on its outputs — every percent of variance becomes a percent of incidents. METR's RCT on experienced developers showed that even with humans reviewing AI output, the net effect on throughput can be negative. Without a human reviewer, the variance compounds without correction.

4. Is the cost of being wrong proportional to its frequency?

Deterministic automation fails predictably and rarely. Agents fail probabilistically and uncorrelated. If the cost of one bad output is high and one bad output per ten thousand is plausible, the math gets ugly fast. A Airflow DAG that fails 0.01% of the time is paged on, fixed, and moves on. An agent that fails 1% of the time across a hundred-thousand-call workload is a slow-rolling incident with no obvious signature.

Then there is the literal cost. An r/aws thread this week described a $97,000 surprise bill from a runaway workload — and that's deterministic infrastructure. Agentic workflows multiply this risk: token usage scales with input size, tool calls retry on transient failures, agent loops can recurse if the termination condition is poorly defined. The blast radius of a bad cost outcome is larger and harder to predict than for a Lambda that just runs longer.

What "agentify it" usually means in practice

The honest version of most "agentify the pipeline" projects is one of these:

  • Wrap an existing script in an LLM call so the project counts as AI. Real motivation: the team needs to put something on the executive dashboard. The LLM adds nothing the script did not already do, but adds a per-run token cost and a non-deterministic failure mode.
  • Replace a switch statement with a prompt. This is the worst version. The original code was already an interpreter — for keys you defined, with branches you wrote. The prompt is the same logic in slower, more expensive, less testable form.
  • Add an agent because the team wants experience with the tooling. This is fine, if you scope it. Pick one step that actually has judgment in it. Leave the rest of the pipeline alone. Most teams cannot resist the urge to agentify everything.

What you should agentify

The places where agents earn their keep in a real pipeline:

  • Content drafting, where a human reviews. Variance is the feature; the human is the filter.
  • Triage and routing of unstructured inputs, when the input space is large and a labeled training set is unavailable.
  • Decisions that require pulling context together at run time — code review with a human approving, incident summarization, customer issue triage with citations.
  • Exploratory tool use where the right sequence of operations is not known in advance — debugging, research, data exploration with a human in the loop.

The pattern: judgment, not plumbing. Variance acceptable, because there is a reviewer. Open input space, not enumerable.

A trap nobody warns you about

Sometimes the right migration is backwards. You shipped an agent six months ago because agents were the move. The agent is now slower, more expensive, and less reliable than the deterministic alternative would have been. The honest fix is to retire the agent and replace it with a script.

This is a hard call to make politically. Nobody gets promoted for replacing AI with bash. But the Project Vend retrospective from Anthropic showed exactly this — Phase 2 fixed Claude's vending-machine-shopkeeper failures less by upgrading the model and more by adding bureaucracy: a CRM, mandatory research steps before quoting, an inventory tool. The bureaucracy is what made the agent reliable. At some point, if you keep adding bureaucracy, you have rebuilt a deterministic workflow with extra steps.

Closing

The right question is not "how do we make this agentic?" The right question is: where in this pipeline does judgment have to happen at run time, on inputs we cannot pre-enumerate, with a reviewer present or stakes low enough that variance is acceptable? That set is small. It is real, but it is small.

Most of your automation should keep being cron, bash, and SQL. If you have to put one thing on the executive dashboard, put the part where the agent does not run. That is the part of your pipeline that ships at three in the morning without paging anyone, and the part that should keep running long after the AI hype cycle has moved on.

You've successfully subscribed to The Cloud Codex
Great! Next, complete checkout to get full access to all premium content.
Error! Could not sign up. invalid link.
Welcome back! You've successfully signed in.
Error! Could not sign in. Please try again.
Success! Your account is fully activated, you now have access to all content.
Error! Stripe checkout failed.
Success! Your billing info is updated.
Error! Billing info update failed.