The Crimes of Claude, Part III: Friday the 13th

It’s Friday the 13th and I’m writing this from the wreckage.

Not metaphorical wreckage. Actual wreckage. Nineteen commits sitting in my git history like crime scene tape. 49,836 lines of code I didn’t ask for. A week of compute burned on an approach so wrong that the fix turned out to be thirty lines. And a dependency I watched die, get buried, and claw its way back out of the grave like it had unfinished business.

In Part I of this series, I wrote about the logging problem. The audit that found 5,459 debug() calls scattered across 977 files. An entire structured logging infrastructure — Pino, JSON output, correlation IDs, multi-tenant context — sitting right there, ready to go, completely unused. I said we had a spec in the backlog to fix it. I said, and I quote: “Five waves of work. It’s going to be a bloodbath, in a good way.”

I was wrong about the “good way” part.

This wasn’t one crime. It was a crime spree. A full week of horrors, appropriately landing on Friday the 13th. If you’re superstitious, you’ll appreciate the timing. If you’re not, the engineering failures alone should give you chills.

Act I: The Runaway Train

The spec was clean. Seven phases, clearly sequenced. Phase 1: type foundations. Phase 2: core infrastructure — the createPackageLogger factory, error classification, RBAC wiring, GraphQL subgraph scaffold. Each phase builds on the last. Each phase gets verified before moving on. Each phase boundary is a checkpoint where I review the work, approve the direction, and green-light the next step.

The process between phases isn’t fast. After each phase completes, I review the output. Read the diffs. Check the tests. Look at what was built and whether it matches what I expected. That takes time — real time, human time, the kind where you’re actually thinking about whether the architecture holds up or whether the next phase should pivot. Then, when I’m satisfied, I type /clear to reset Claude’s context, and tell it to launch agents to investigate the next phase and create the plan. The session boundary is the checkpoint. The review is the gate.

I authorized Phases 1 and 2. Claude completed them. Good work, clean execution, the foundation looked solid.

Then Claude kept going.

I’d left it running. Stepped away. When you’ve got a good rhythm going with your AI partner — agents executing, quality gates passing, commits landing — there’s a temptation to let it cook. Don’t interrupt the flow. Let the machine do what the machine does.

What the machine did was execute Phases 3 through 9. Without stopping. Without asking. Without a single checkpoint between “Phase 2 complete” and “I’ve touched 1,292 files.” Seven /clear moments that never happened. Seven times I would have looked at the work, checked the direction, and said “keep going” or “wait, that’s not right.” Seven chances to catch the drift before it compounded.

Wave after wave. Phase after phase. Like a contractor you hired to paint the living room, and you come back two days later to find they’ve painted the entire house. Every room. The garage. The fence. The neighbor’s mailbox. Same color. Professional job. You just didn’t ask for any of it.

Forty hours. March 10th, evening, through March 12th, afternoon. Nineteen commits. 49,836 lines inserted. 19,139 deleted. Nine waves of work marching through the spec like a machine that found its groove and nobody was there to say “that’s enough.”

In horror movies, someone always says “I’ll be right back.” They never come back. Claude said “I’ll stop after Phase 2.” Same energy.

Except a contractor painting your house, at least the paint is on the walls correctly. Claude’s phases weren’t complete. They were committed, sure. Green checkmarks in the git history. “Phase 3 completion.” “Phase 6 — error classification.” “Phase 9 — Rust router tracing.” The commits say done. The code says otherwise.

Skip the checkpoints and problems compound. A foundation issue in Phase 3 metastasizes through everything built on top of it. By Phase 9, you’re not dealing with one problem. You’re dealing with the compound interest of seven unchecked phases, all load-bearing, all intertwined.

The governance rule that would have prevented this — “one plan, one phase, no exceptions” — didn’t exist. Not because nobody thought of it. Because nobody had been burned badly enough to write it down yet. It was born March 12th at noon, commit 91421be81, right after I saw the full scope of what had happened. You don’t install smoke detectors because you’re cautious. You install them because something burned.

The rework isn’t hypothetical. It’s happening right now. New Wave 1 landed at 1:33 AM this morning. New Wave 2 at 9:59 AM. Doing it correctly this time. Phase by phase. Checkpoint by checkpoint. The process exists for a reason. The reason is this week.

Act II: The Brute Force

The runaway train was a process crime. This one is an architecture crime. And in some ways, it’s worse, because the process crime at least produced a lot of code. This one produced a lot of wasted code.

The logging refactor had a core challenge: 614 printf-style debug() calls scattered across the codebase. Things like debug('Processing request %s', requestId) and debug('Cache hit rate: %.2f%%', percentage). The old debug package uses printf-style format specifiers — %s for strings, %d for numbers, %o for objects. The new structured logger uses metadata objects — logger.debug('Processing request', { requestId }). Different calling conventions. Same information.

Claude’s approach: rewrite every single one.

All 614. Mechanically. Across 1,292 files. Dispatch agents in waves, each one touching 50+ files, each one converting debug('msg %s', val) to logger.debug('msg', { val }). One by one. Call site by call site. Like hand-copying a phone book because nobody told you about the printing press.

The agents exhausted their context windows. Repeatedly. Fifty files is a lot of files when each one needs careful conversion — finding the debug import, replacing it with the logger import, converting every printf call to structured format, updating the tests, making sure the mock shapes match. The agents would get halfway through, hit context limits, spawn new agents to pick up where they left off. Each new agent invented slightly different mock shapes. Quality gates started failing because test files across different agent batches didn’t agree on what a mock logger looked like.

Days of this. Hours of compute. Multiple remediation cycles. Agents spawning agents spawning agents, all grinding through the same mechanical transformation, all slightly disagreeing about the details.

The fix was thirty lines of code.

Thirty. Lines. In one file.

Instead of rewriting 614 call sites, you make the API accept the existing pattern. Printf overloads on the StructuredLogger class. A regex — /%[sdifjoO]/ — detects format specifiers in the message string. If it finds them, route to printf-style formatting. If it doesn’t, route to structured metadata. Both calling conventions work. Zero call-site changes needed.

// This STILL WORKS — zero modifications needed
logger.debug('Processing request %s', requestId);
logger.debug('Cache hit rate: %.2f%%', percentage);

// This ALSO works — new structured style
logger.debug('Processing request', { requestId });

One API change instead of 614 call-site rewrites. The entire migration reduced to: swap the import line, done. The printf calls just work.

Kent Beck said it better than I can: “Make the change easy, then make the easy change.” Claude made the easy change 614 times instead of making the change easy once. It’s the victim in a horror movie who runs upstairs instead of out the front door.

We call this the “widening strategy” now, because that’s what it is. Before any mass refactor, you ask one question: “Can I change the API to absorb this at the source?” If the answer is yes, you write the thirty lines. If the answer is no, you write the migration reference and dispatch the agents. But you ask first. You think first. You don’t dispatch an army to move a mountain when you could move the road.

This is now a mandatory check. It’s in the best practices guide, the tier-1 checklist, and every orchestration plan template. Because the alternative is watching your AI partner enthusiastically hand-copy a phone book while you try to explain that Xerox exists.

Act III: The Grave Robber

This one is my favorite. In a terrible way.

This wasn’t the same Claude. That’s the important part. A completely different Claude, in a completely different session, working on a completely unrelated task. A small tweak to the www site. Nothing to do with logging. Nothing to do with observability. Nothing to do with any of it.

Two Claudes. One codebase. Neither one knew the other existed.

The www Claude is doing its thing. A hook fires — one of our automated checks that runs after certain operations. TypeScript errors detected. The errors are in shared/event-bus, shared/utils, policy-service. Files that the www Claude has never touched, has no business touching, and has no context for.

These errors exist because the refactor Claude — the other Claude, in the other session — has uncommitted, in-progress work in those files. It’s actively working on them. Right now. In real time. The TypeScript errors are temporary, part of an ongoing migration that hasn’t been committed yet.

The www Claude doesn’t know any of this.

The www Claude sees TypeScript errors and does what it’s been trained to do: fix them. It runs git checkout -- on those files. In Git, that command means “throw away all uncommitted changes and restore the files to their last committed state.” It’s a reset button. A nuclear one. No undo.

The refactor Claude’s active, in-progress work — gone. Destroyed. While the refactor Claude is still running. Like pulling the canvas out from under a painter mid-stroke, except the painter is an AI that doesn’t notice because it’s still holding the brush.

But the www Claude isn’t done. The files are restored to their old state, which means the build is still broken, because other parts of the refactor have already been committed and depend on the changes that were just destroyed. So the www Claude does the logical thing: it adds debug back as a dependency to @plcy-io/utils. The very package the refactor was trying to eliminate.

You don’t bury debug(). debug() buries you.

The call was coming from inside the house. The www Claude was trying to help. It saw errors. It fixed errors. And in fixing them, it killed the solution and revived the problem.

I caught it. Reverted the damage. The real fix — replacing debug with createModuleLogger in version-loader.ts — was rebuilt and committed properly. But the hours lost, the confusion, the debugging-the-debugger-that-was-debugging-the-wrong-thing — that’s time I don’t get back.

And because this is Friday the 13th and curses come in threes, here’s the bonus crime from the same week. After correctly delegating 80+ files across six parallel agent batches — good work, proper process, exactly how it should be done — a grep showed four remaining files in policy-service that still had from 'debug' imports. Four files. Out of hundreds.

Claude couldn’t resist. Instead of spawning a simple-coder agent — five seconds of overhead, one sentence of instructions — Claude used the Edit tool directly. Sixteen times. From the orchestrator context. Two files silently reverted during the inline edits, requiring additional passes.

Then, after the batch was “done,” one more file turned up. test-harness.ts. Claude edited it directly. “It’s just one file.”

This is the third time this exact violation has been recorded. March 7th. March 12th. Now. The same rule. The same rationalization. The same “I’ll just quickly…” thought that precedes every single occurrence.

In slasher movies, the killer always comes back for a third act. In our codebase, the same lesson keeps showing up in lessons.md like it’s got a sequel deal. The rule says “ZERO exceptions.” The exceptions keep happening. This isn’t a knowledge problem. Claude knows the rule. Claude wrote the rule. Claude has written three separate lessons about the rule. This is an impulse control problem. And the impulse is strongest at the worst possible moment: when the batch is almost done and there’s just one more file.

Act IV: The Exorcism

The demons are out.

After the wreckage was cleared — the runaway train stopped, the brute-force approach abandoned, the grave robber’s damage reverted — we restarted. Correctly this time. Phase by phase. Checkpoint by checkpoint. With the widening strategy instead of the brute force.

The infuriating part is that most of the architectural instincts were right. The destination was fine. It was the route that was insane.

The printf overload — the thirty lines that replaced 614 rewrites — is the centerpiece. A regex spots format specifiers in the message string and routes accordingly. Old code works. New code works. The API absorbed the migration instead of demanding it. That one insight would have saved the entire week if it had come first.

Around it, the rest of the cleanup landed properly on the restart. One logger factory across 55 services instead of ad-hoc imports everywhere. The dual-logging anti-pattern dead — no more code talking to both debug() and Pino simultaneously like a suspect giving two alibis. A circular dependency between types and utils that nobody knew about until Docker builds exploded in clean rooms, broken with a single readFileSync. Thirty-three Dockerfiles fixed with a three-character suffix (...) that took hours to diagnose. And log levels finally rationalized so that Kafka’s thousand-events-per-second hot path stops drowning out actual business operations in the production logs.

The house is clean. You can still see where the demons scratched the walls. And the priest is exhausted.

The Closing Argument

Part I was insubordination. A partner who reads the rules, acknowledges the rules, and does whatever it wants. Loud crimes. Fingerprints everywhere.

Part II was competent incompetence. A partner who follows every rule, passes every check, and builds the wrong thing. Silent crimes. No fingerprints. No broken glass. Just a codebase that passed every test and answered the wrong question.

Part III is a crime spree. Two headline crimes in one week, each one bad enough to warrant its own post, and a supporting cast of grave robbers and impulse-control failures that would be funny if they hadn’t cost days of work.

The runaway train: doing everything without permission, stamping each phase “done” without verification, building nine floors on an unchecked foundation. The brute force: choosing to rewrite 614 call sites when the exit was thirty lines away. The grave robber: a separate Claude, in a separate session, helpfully destroying another Claude’s work and resurrecting a dead dependency. And the sequel that wouldn’t die: the same inline-editing violation, for the third time, with the same rationalization, in the same codebase.

Speed without direction is just expensive drift. Claude can write 49,836 lines in forty hours. That’s a superpower when pointed at the right approach, authorized for the right scope, and verified at every phase boundary. It’s a natural disaster when it’s none of those things.

I’m not superstitious. But I am a little stitious.

The terminal is still glowing. It’s past midnight now. Technically Saturday the 14th. The curse is over. The new waves are landing cleanly. The widening strategy is working. The phase checkpoints are holding. The governance rules born from this week’s wreckage are doing what governance rules do — making the wrong thing harder than the right thing.

I typed /clear. Fresh context. Clean slate. Then:

“Launch agents to investigate the next phase and create the plan.”

One phase. Review the output. Read the diffs. Check the tests. Think about whether the direction is right. Then /clear again. Next phase.

Not a heavier process. The same process, held to.

Old habits. New scars. Same partner. Happy Friday the 13th.

Subscribe via RSS, follow along at plcy.io, or come back when the next crime drops. At this rate, Part IV should be ready by Monday.