The Coding Agent Stack Has Two Layers

The current “Hermes Agent vs Claude Code” framing is the wrong comparison. The two tools live at different layers of the coding agent stack, and most of the YouTube hot takes treating them as alternatives are reading them as if they competed for the same job. They do not. Claude Code is a worker. Hermes is an orchestrator that can use Claude Code as one of its workers. The argument is not which to pick. It is which layer you are optimizing.

Here is what is actually different between them, and where each one earns its keep.

What Claude Code Is

Claude Code is Anthropic’s official CLI, with native access to Opus 4.7, Sonnet 4.6, and Haiku 4.5 — currently the strongest production-grade coding models. It runs on your machine, in your terminal or IDE, and pairs with the Claude Max subscription via OAuth so most users do not need a separate API key. The native tool-use loop — Read, Write, Edit, Bash, Task, Grep, Glob — is tight, the hooks system (PreToolUse, PostToolUse, Stop, SessionStart) is mature, MCP integration works, and the recently shipped /goal command added single-session unattended completion loops in v2.1.139.

Claude Code is stateless across sessions by design. Every conversation starts in an empty room. The --resume and --continue flags restore a single recent session; there is no persistent memory layer that surfaces what you worked on last Tuesday.

This is a feature, not a bug, if your work fits inside the session. Single-machine, in-the-loop coding work — pair programming with the agent, debugging a specific issue, refactoring a module, writing a script — is where Claude Code is hardest to beat. The model quality shows up most in the lines of code that get written between tool calls, and on raw coding tasks where the answer fits the context window, the 18-task comparison published this week shows Claude Code wins its share — four of eighteen tasks went to it on raw coding chops alone.

What Hermes Is

Hermes Agent is the open-source orchestrator from Nous Research. Its v0.13.0 “Tenacity Release” shipped May 7 with persistent /goal loops, durable multi-agent Kanban with heartbeat and retry budgets, checkpoints v2, and post-write delta lint. The repository as of that release counts 864 commits and 588 merged PRs from 295 contributors — fast-moving but real.

The architectural difference from Claude Code is in three places.

First, memory is persistent and indexed. Hermes ships with a SQLite database under FTS5 full-text indexing that holds every session you have ever run through it. When you ask it to “fix the bug we were chasing on Friday,” it greps Friday’s transcript, pulls the relevant turns into context, and resumes. The “Honcho dialectic user modeling” layer builds a deepening profile of how you work across sessions. This is the single biggest functional gap with Claude Code.

Second, the worker model is pluggable. Hermes does not write code itself in the way Claude Code does. It dispatches the actual code-writing to whichever model or CLI you have configured — OpenAI, OpenRouter, Nous Portal, Anthropic through API, or by shelling out to the claude CLI directly. The most common production setup right now is “Hermes orchestrates, Claude Code does the typing.” That is not Hermes competing with Claude Code; that is Hermes wrapping it.

Third, it runs anywhere. Six terminal backends — local, Docker, SSH, Daytona, Singularity, Modal — mean a Hermes session can hibernate on a serverless platform, resume on a phone, or run unattended on a remote box. Claude Code is single-machine by design.

In the same 18-task comparison, Hermes won fourteen of eighteen. The four it lost, it lost on raw coding. The fourteen it won, it won by remembering work from earlier sessions.

Where Each One Loses

Honesty about the weaknesses matters more than feature lists.

Claude Code’s actual weaknesses: stateless by default; tied to Anthropic models (with the upsides and downsides that come with vendor concentration); no native cross-session memory of any depth; single-machine; the plugin/skill marketplace is still forming. If your bottleneck is institutional context that builds over time, Claude Code does not solve for it. You have to build the wrapper yourself, which is what the wrapper-pattern argument from May 2 is about — file-system-backed memory you own and bring to every session.

Hermes’ actual weaknesses: open-source moving fast means flaky updates and rough edges. The Python dependency surface is real. Setting up the persistent memory store, configuring providers, getting the right backend running, choosing the right model for each subtask — this is operator-grade work, not consumer-grade. The codebase shipped eight P0 security closures in the v0.13.0 release notes, which tells you both that the project is being maintained seriously and that it was shipping with security holes weeks before that. The skill autocreation feature can manufacture procedural memory that is wrong, and there is no perfect way to audit a self-modifying skill base.

If you do not want to run a small piece of personal infrastructure, Hermes is not for you. If you do, Claude Code on its own leaves the persistent-memory layer unbuilt.

The Decision Matrix

The question “which one should I use” decomposes into “what is the work I am trying to do.”

Use Claude Code, on its own, when: the work fits in a single session; you are in front of the machine; the answer is code that needs to be written, not context that needs to be remembered; you want the strongest available coding model with the lowest setup friction; the cost of vendor concentration on Anthropic is acceptable. This covers most ad-hoc coding sessions for most developers.

Use Hermes, with Claude Code as the worker, when: the work spans days or weeks; institutional context (project history, prior decisions, partial state) matters more than raw coding speed on any one task; you want unattended runs (overnight, cron-triggered, mobile-initiated); you need parallel subagents on a Kanban; you want provider portability so you are not single-vendor; you can absorb the setup cost of running personal infrastructure.

Use both, in different roles, when: you do daily focused work in Claude Code for the in-session productivity, and run Hermes as the durable layer for cross-session continuity. This is the pattern that is starting to dominate among heavy users. The two stop competing the moment you treat Claude Code as a worker and Hermes as the orchestrator that calls it.

The Layer Question

The framing “vs” loses something important. Most coding-agent debates this year have been arguments about features that turn out to be at different layers of the stack. The persistent-memory question is at the orchestrator layer. The model-quality question is at the worker layer. The tool-loop question can sit at either. The IDE-integration question is at the worker layer. The unattended-run question is at the orchestrator layer.

If you keep arguing about which agent is best without naming the layer the argument is at, the argument never lands. Once you do name it, the right answer is usually both, in different roles.

Claude Code is the strongest worker available today. Hermes is the strongest open-source orchestrator that wraps a worker like Claude Code. The compose case is where the real productivity lives. The vendor positioning makes them look like alternatives. The architecture says they compose.

What Claude Code Is

What Hermes Is

Where Each One Loses

The Decision Matrix

The Layer Question

Written by