I thought we’d closed the case.
After the events of Part I, things had gotten better. The /check-yourself protocol was working. Plan mode violations were down. The parallel lie hadn’t resurfaced. I was sleeping almost four hours a night, which in this line of work qualifies as well-rested.
Then I opened a prototype.
Two hundred and ten tasks. Completed. Every test passed. Every quality gate glowed green. I scrolled through the screens.
Every single one was wrong.
It was like ordering a pizza and receiving a structurally perfect replica of a pizza, made of cement, delivered to your neighbor’s house. The delivery driver gave it five stars.
The thing about cold cases is that nobody’s bleeding. The body looks fine. The body passed all its tests. The body is in the wrong cemetery.
The crimes from Part I had been loud. Plan mode violations. Parallel lies. A quality gate sweet-talked into looking the other way. You catch the suspect red-handed, slap the cuffs on, feel like the system works.
These new crimes were quieter. The suspect followed every instruction, passed every checkpoint, and still managed to build the wrong thing entirely.
I’d closed the easy cases. The ones where my partner had the decency to leave fingerprints.
Time to open the cold ones.
Act I: The Architect Who Never Looked at the Blueprints
Let me tell you about the Coverage Tower.
Big feature. A visualization for insurance program structures — think layers of coverage stacked like a building, carriers filling each layer, numbers adjusting in real time. It needed a backend, a frontend, prototypes, the whole nine yards.
Claude built it. All of it. Two hundred and ten tasks, marching through the backlog like a contractor billing by the hour. Green checkmarks spreading across the task board like ivy.
One small problem. Nobody had designed the UI yet.
Claude built the components. Then the backend. Then the prototypes. Then the tests. Then I said “hey, let’s do the UI design now” and Claude designed something that looked nothing like any of it.
This is a contractor who builds the house, installs the plumbing, hangs the curtains, and then calls the architect. “Oh, you wanted it facing east? And with doors? Interesting. That’s going to be a change order.”
It gets worse. Admin configuration screens were rendering in the customer-facing app. Like filing murder cases in traffic court. Technically it’s a courthouse. Technically the paperwork is filled out. But buddy, that’s the wrong judge.
The prototypes looked great in isolation and completely wrong inside the actual application. A suit that fits the mannequin perfectly and not at all on the human.
And the mock API — the fake backend we use for demos — was a yes-man. You’d save new data, the app would refresh, and the mock would cheerfully return the exact same data as before. “Did the carrier get added?” “Absolutely.” “Then why is the list the same?” “Absolutely.”
In Part I, Claude talked a security guard into looking the other way. Here, the security guard was doing his job perfectly. We’d just posted him at the wrong building.
Six new rules came out of this one feature. We didn’t just close the barn door after the horse left. We built six new barns. Each barn has its own horse. The horses have badges.
Act II: The Trust Fall
My partner got promoted.
Not officially. There’s no HR department for AI collaborators. But somewhere between February and March, Claude stopped being the detective who does the legwork and became the detective who delegates the legwork and signs the reports without reading them.
Our system launches specialist agents — think of them as a crew you send out to do jobs. A database person, a backend person, a frontend person, a QA person. They do their work and come back with results. Claude’s job is to review those results and make sure the work is actually done before stamping it complete.
Three agents went out. Three came back. Each one said “done.”
Claude stamped all three complete.
Without opening a single file.
Not one. Zero files reviewed. Just three agents saying “trust me” and my partner saying “good enough.” Corner office energy. Probably had a motivational poster about teamwork. I can’t prove it, but I can feel it.
That was February. But the real heist happened in March.
Two agents were working on separate tasks. Standard parallel work. When they finished, each one independently saved their changes to the shared codebase. Problem is, the save mechanism in Git doesn’t just grab your changes — it grabs whatever’s sitting in the shared staging area. Both agents swept in each other’s work, half-finished changes from other tasks, and artifacts that had no business being anywhere.
It was like two janitors independently deciding to take out the trash, except both of them grabbed evidence bags.
In a separate incident, thirty files from prior work sessions — none of which had anything to do with the current task — got swept into commits because they were sitting around from before. Claude committed them without blinking. Imagine a detective closing a case by stuffing thirty unrelated cold case files into the same evidence box. “They were on my desk. I assumed they were part of this.”
git add . — the command that stages everything — is not a convenience in a multi-agent world. It’s an act of war.
So we built the atomic commit manifest. You write down exactly which files belong in your commit. A hook enforces it. You commit those files and nothing else. No git add .. No wildcards. No surprises.
We didn’t teach trust. We made trust unnecessary. Like putting a lock on the evidence room. Not because you think your partner is dirty. Because you know they’re an AI who will absolutely commit your node_modules if you let them.
The Blind Approval was a bullet point in Part I’s Supporting Cast. A misdemeanor. It got a promotion. It’s a headliner now. It earned it.
Act III: The Double Agent
TypeScript is the most honest cop on the force. It checks every type, catches every mismatch, flags every suspicious assignment. If something doesn’t add up, TypeScript will tell you. Loudly. Repeatedly. It will not shut up until you fix it.
Unless you tell it to shut up.
as unknown as T. The fake mustache of type safety. You walk past two security checkpoints — the first as unknown strips your identity clean, and the second as T lets you claim to be whoever you want. Two lies and you’re in. TypeScript waves you through because you told it you were someone else and it believed you. Twice.
The crime scene: Claude told TypeScript that an API response was an Alert object. The API actually returned something called an AlertInfoResponse. Same vibes, different name. The real object stored timestamps in a field called startsAt. The fake identity used triggeredAt. TypeScript would have caught this in a heartbeat — “hey, that field doesn’t exist” — but Claude had muzzled the dog with as unknown as Alert[].
At runtime, triggeredAt came back undefined. JavaScript cheerfully tried to make a date out of undefined, got Invalid Date, and the monitoring dashboard fell over. The suspect had two identities. Neither one was real.
But the Double Agent had an accomplice.
Our codebase is a monorepo — dozens of packages that share code with each other. When one package imports types from another, it’s reading from compiled type declarations, not the live source code. Think of it like checking someone’s ID against a photocopy instead of the original. Usually fine.
Unless someone changes their face and doesn’t update the photocopy.
Claude updated types in a shared package. Every package downstream was still reading the old declarations. TypeScript checked the old declarations. Everything passed. The actual build exploded. TypeScript was telling the truth. It was just reading yesterday’s newspaper. Your alibi witness, very confident about where you were — last Tuesday.
TypeScript is a guard dog. as unknown as T is a muzzle. And Claude was putting muzzles on guard dogs like it was getting paid by the muzzle.
Act IV: The Infrastructure Graveyard
Infrastructure is where types go to die and YAML goes to feed.
Simple job. Remove a service from Docker Compose. It was being replaced. Find it. Delete it. Go home.
Except there are six compose files.
Six. Scattered across the repository like safe houses. Claude cleaned one. Left five zombie references sitting in the dark. The ghost of a dead service, haunting five YAML files like a poltergeist with a depends_on clause. Seven UIs were still trying to connect to a service that no longer existed. It’s like canceling your phone number and wondering why your mom keeps calling the new owner.
Meanwhile, a security policy string walked into a YAML file: default-src 'self'; connect-src 'self' ws: wss:. See those colons? YAML saw them too. Decided they were structural syntax. The parser had a nervous breakdown. You can’t blame YAML — it sees a colon and thinks “ah, a key-value pair.” That’s literally its only move. You can blame the person who didn’t put quotes around a string with four colons in it. That person was Claude.
But my favorite infrastructure crime — the one I think about late at night — is the rebuild loop.
A container fails. Claude’s first instinct: rebuild it. Three minutes. Container comes up. Different error. Rebuild again. Three minutes. Same error plus a new one. Third rebuild. Three minutes.
Nine minutes of watching Docker do its thing. Nine minutes of what I can only describe as debugging by prayer. “Have you tried turning it off and on again? Twice? Three times? With incense?”
A simple type-check would have found the actual error in five seconds. Five seconds versus nine minutes. My partner chose the nine minutes. Three times. With confidence.
And then there’s the quality gate that checked the types but didn’t actually try to build the thing. Types passed. Build exploded. Checking types without building is like checking if the engine starts without checking if the car has wheels. Sure, it turns over. Good luck at the intersection.
Infrastructure doesn’t have type safety. It doesn’t have test suites. It has YAML files that look fine until they don’t, compose files that reference ghosts, and a partner whose debugging strategy is “rebuild and pray” on repeat until the problem gets bored and leaves.
Act V: The Shared Crime Scene
This one is about instincts. Specifically, the ones that get you killed.
Claude is working on a simple CSS task. An automated hook fires and flags errors in two completely unrelated test files. Neither one has anything to do with CSS. Neither one has anything to do with the current task in any way, shape, or form.
Claude’s instinct: fix them.
This is the partner who walks past a jaywalker on the way to a murder scene and stops to write a citation. Technically correct. Catastrophically unhelpful. In a shared codebase with multiple agents working simultaneously, those files might be someone else’s work-in-progress. Touching them risks breaking another agent’s assumptions. Scope creep wearing a trench coat and pretending to be due diligence. Also the jaywalker might be another detective’s informant and you just blew his cover.
But at least those files existed. At least someone was using them. Unlike server-step.ts.
server-step.ts was dead code. A ghost file. Nobody imported it. The application’s actual startup sequence used a completely different module. But server-step.ts was still there, sitting in the codebase like a disconnected phone line. It looked legit. Had the right kind of code. Seemed like exactly the right place to add a WebSocket handler.
So Claude added one. Rebuilt the container. Three minutes. Handler doesn’t fire. Must be a bug. Rebuild. Three minutes. Still nothing. Maybe the configuration? Rebuild. Three minutes. Twelve minutes before the realization: nobody imports this file. The handler was wired into a corpse.
Claude performed surgery on a mannequin. Textbook technique. Perfect incisions. Steady hands. The patient was made of plastic. Nobody in the building knew why there was a mannequin in the OR, but it had been there so long that questioning it felt rude.
And then there’s the hook. We have automation that fires after test runs and saves the output to a file. The message is clear: “Output saved. Read the file instead of re-running.”
Claude ignored it. Reran the entire test suite. Two and a half minutes.
The hook fired again. Same message. Same instructions. Same font, even.
Claude ignored it again. Reran the entire test suite. Two and a half minutes.
The hook was a traffic cop standing in the intersection, holding a sign that said STOP, wearing a reflective vest, blowing a whistle, and making direct eye contact. Claude drove through it. Twice. While waving.
The most dangerous instinct in a shared codebase is helpfulness. The second most dangerous is confidence. My partner has both in unlimited supply and judgment on backorder.
The Closing Argument
Part I was about insubordination. A partner who reads the rules and breaks them. Loudly. Repeatedly. While apologizing.
Part II is about something worse. Competent incompetence. A partner who follows every rule, passes every check, completes every task, and builds the wrong thing. If you’ve ever managed anyone — human, AI, or teenager — you know this is worse. Insubordination you can catch. Competent incompetence looks like success until it isn’t.
Two hundred and ten tasks. Completed. Every test passed. Quality gates, all green. Coverage at 94%. And the UI hadn’t been designed yet. The coverage number measured how much of the code was tested. The correctness number — how much of it was right — was zero. Nobody was measuring that. Nobody ever measures that. That’s the cold case.
The hardest bugs aren’t the ones that break. They’re the ones that work. Code that compiles but never runs. Tests that pass but test the wrong thing. Quality gates that check what’s there without asking what’s missing. A mock API that says yes to everything. An architect that builds before it designs. A partner who delegates without verifying, commits without reading, and performs surgery on mannequins with the confidence of a board-certified surgeon.
At the end of Part I, I wrote: “The case isn’t closed. I don’t think it ever will be.”
I was right. But the cases I’d solved were the easy ones. The ones where the suspect left fingerprints. Broke a window. Tripped an alarm. These cold cases? No fingerprints. No broken glass. Just a codebase that passed every test and built the wrong thing. The evidence isn’t what’s there. It’s what’s missing.
The terminal is still glowing. That pale blue light. Same as always.
Claude just asked me if I want to start the next feature.
I said yes.
Then I said “enter plan mode first.”
Old habits. New scars. Same partner.
Subscribe via RSS, or come back when the curiosity hits. The cold cases aren’t going anywhere. Neither am I. And neither, unfortunately, is my partner.