The Five Failures That Shaped My Personal AI Stack

Every working stack is the residue of failures the operator did not see coming. The Saturday piece showed the architecture as it stands now. This piece is the inverse — the five specific incidents that produced the current shape. Each one started as a quiet bug and ended as a permanent change in how the system runs.

Failure 1: The Eleven-Day Stale Lock

On May 15 the session-end auto-commit hook tried to commit pending changes and failed. The commit attempt collided with a .git/index.lock file that had been sitting in the repo since May 3 — a zero-byte file created by a crashed git process eleven days earlier. The hook had been quietly failing every session in between, and nobody had noticed because the failure mode was silent.

Root cause: the hook had no defense against orphaned lock files. The original code assumed any .git/index.lock it encountered was held by a live git process, which is true ninety-nine times out of a hundred. The hundredth time was a process that died without releasing the lock.

Fix: a five-line stale-lock cleanup block. The hook checks for .git/index.lock before attempting the commit. If the lock exists, it checks the file’s mtime against the current time — a lock older than five minutes is suspicious. If the mtime is old, the hook then verifies via lsof that no live process holds the file. Both conditions true: delete the lock. Either condition false: preserve it.

Healthy auto-commits complete in under a second. The five-minute threshold cannot race a real concurrent run. Tested across three scenarios — no lock, old lock with no holder, fresh lock with a live holder — before the change shipped.

The general lesson: hooks accumulate edge cases. The version of the hook that survives a year of daily use is the version that handles the failure modes you discovered along the way.

Failure 2: The Silently Forked Database

For eight days between May 4 and May 12, the content engine was writing to two different SQLite databases at the same time without anyone noticing. The cron pipeline at ~/services-local/content-engine/data/content.db was getting new topics from the daily trend-scan. The manual publish scripts in the same directory were also writing there. But a separate copy of the same database file at ~/.local/share/nexus/services-db/content-engine/content.db, which a broken Synology XSym symlink in the nexus path was silently resolving to, was getting the older trend-scan rows from the AI-driven path.

Both files had content rows, both had topics rows, both had publications rows, and the IDs overlapped. The reason this was not immediately catastrophic was that the disjoint content was bounded — temporal handoff between the two files happened cleanly on May 4 when the manual sprint began, so there were no genuine ID collisions, only orphaned rows on each side that the other side did not know about.

Root cause: a Synology XSym pointer in the nexus directory that had been treating one of the source files as a symlink to a different location than the canonical one. The XSym format does not behave the same way as a POSIX symlink across mount boundaries; the difference between the two had been silent.

Fix: an ID-offset merge that brought the orphaned rows from the older file into the canonical one (topics +1000, content/research/publications +100). The sqlite_sequence table got rebumped. PRAGMA foreign_key_check came back clean. Backups of both source databases were saved before the merge. The broken XSym symlink was replaced with a real POSIX symlink to the canonical path.

The general lesson: silent forks are the worst class of incident because they degrade trust in the data retroactively. Anything that reports counts, dedupes, or makes scheduling decisions against the table is suspect until reconciled.

Failure 3: The Re-Generated Drafts

On May 13 the 10 AM draft.ts cron produced two pending_review drafts for titles that had already been published in April. The system was about to ship a second copy of two pieces that had been live for weeks. The drafts sat in Slack for review and got caught before they shipped, but the failure mode was that the cron pipeline would have happily generated them again the next day and the day after that until someone noticed.

Root cause: two compounding gaps in the state machine. The content_approve handler in review-workflow.ts only advanced the content status; the topic status stayed at whatever the draft-runner left it, which meant a successfully published piece could leave its topic in drafted (happy path) or approved (if the Slack post mid-draft failed). Trend-scanner had a getPublishedContentTitles() dedupe; draft.ts did not. Then the May 12 DB merge brought two topics from the forked database in at status='approved', and the next day’s 10 AM cron drained them.

Fix in two parts. A defensive guard in draft-runner.ts that imports getPublishedContentTitles, builds a lowercase Set once per run, and skips and archives any topic whose title matches an already-published title. Re-drafting becomes structurally impossible regardless of upstream state-machine leaks. A state-machine fix in review-workflow.ts that calls updateTopicStatus(content.topic_id, 'archived') when the content_approve case fires with a non-null topic_id.

The general lesson: a state machine is only safe when the invariants hold from both directions. The trend-scanner had the dedupe; the drafter did not. Now both do.

Failure 4: The 409 That Was a Success

On May 2 the Instagram carousel publish for a T3 piece returned an HTTP 409 from Late.dev — “exact content already scheduled,” with an existingPostId field pointing at the post the request had just created. The carousel had successfully scheduled. The response said it had failed.

Root cause: Late.dev’s API was returning a duplicate-detection error against requests it had itself just enqueued, before its internal scheduler reconciled them. The 409 was a race condition between insert and dedup-check.

Fix: a try/catch around the IG publish call that catches the 409, parses the existingPostId from the error response, and treats it as success — inserts a publication row pointing at the returned ID, marks the content row as status='published'. The fix lives in publisher.ts > publishToInstagram.

The general lesson: integrations with vendor APIs accumulate vendor-specific quirks. The fix is not to file a support ticket and wait. The fix is to handle the quirk inside your wrapper and move on. The May 2 incident produced Hard-Won Lesson #21 — the corpus reference to the broader pattern of catching false negatives at the integration layer.

Failure 5: The Named Foil

On May 4, the contrarian piece “Agentic Coding Isn’t the Trap. Supervising From Your Head Is.” named the writer of the original argument I was rebutting and proceeded to characterize their position in ways that pushed beyond what they had actually written. Twelve days later, on May 16, the author of the original piece pushed back publicly in the LinkedIn comments — quoting their own piece to show they had never advocated the specific thing I had implied they advocated.

The pushback was fair. The strawman risk had been highest precisely because their position was close enough to mine that the extrapolation felt safe. I acknowledged the correction publicly on LinkedIn, added an editor’s note at the top of the original Ghost post linking back to the comment, and shipped a new reusable script (scripts/add-editors-note-faye.ts) that uses the Ghost JWT auth pattern to add notes idempotently to any post.

Root cause: a voice-and-discipline gap, not a code gap. Two patterns compounded — naming a foil author in the prose, and using the negative-parallelism title pattern (“X Isn’t Y. Z is.”) that depends on a strawman to work.

Fix: two new entries in the feedback memory. The first bans the “X isn’t Y. Z is.” title and lede pattern across the corpus. The second bans naming the contrarian target in prose — the link to the source piece can stay, the URL slug can carry the author’s name, but the in-prose attribution does not. Both rules are now part of the auto-loaded session context. Subsequent pieces — the May 19 Goodhart piece responding to a field guide, the May 20 co-design piece responding to an academic article — followed both rules and shipped clean.

The general lesson: the corpus is the residue of editor’s notes. Every voice-discipline rule worth keeping was learned from a specific incident where shipping without it produced a public correction.

What Survives

The current stack is the survivor of these five and a dozen smaller incidents I am not writing up. The pieces of it that look obvious in retrospect — the stale-lock defense, the canonical-DB symlink, the dedupe guard in the drafter, the 409 catch in the publisher, the named-foil ban in the lint — each one came from a specific incident the original design did not anticipate.

The stack is not what I planned. It is what is left after the failures pruned the parts that did not work. Anyone reading the Saturday architecture piece is looking at the convex hull of those five corrections, plus the smaller ones, plus the parts that worked the first time.

Show your stack. Show the failures that shaped it. Show the editor’s notes. The thing that ships is the thing that survived.