You've started shipping faster with AI, and now each session threatens to turn you into your own full-time reviewer.

A good live sound engineer doesn't wait until the encore to learn the mix was muddy. They hear the room start to smear while the band is still playing, then they reach for the knob.

AI coding has the same problem.

You ask for speed. The model gives you output. Then the saved time disappears into checking everything it wrote.

I've noticed this fastest on small tools like Pricing Unit Picker or Barbell Position Calculator. In a larger codebase, smell matters even more because the blast radius is bigger. On a tiny tool, the review tax is just easier to see because there is nowhere for it to hide.

The caution comes from a sane place. Models are fast, confident, and fully capable of making local decisions that look tidy and age badly.

The standard advice is to review every line. For a tiny snippet, fine. For real product work, that advice quietly rebuilds the bottleneck you were trying to remove.

The Careful Advice That Recreates the Old Job

Once AI starts producing serious amounts of code, most people fall into two bad loops.

One loop is blind trust. You merge momentum and inherit cleanup later.

The other loop feels responsible. You read every line, verify every branch, and slowly turn yourself into a human checksum.

Both loops miss the upgrade.

The real promise was moving your attention up the stack, toward direction, tradeoffs, and early intervention.

That is why the productivity story around AI coding still feels slippery. In a 2025 METR study, experienced open-source developers working in repositories they already knew well took 19% longer when they were allowed to use AI tools.¹ That result does not mean AI slows everyone down. It does show how easily the overhead can move from typing into prompting, waiting, reviewing, and cleanup.

Reading every line sounds responsible because it protects you from obvious breakage. It also makes you personally absorb the full surface area of the model's output. Once the output gets large enough, your leverage collapses.

You become a human parser for machine enthusiasm.

Smell Shows Up Before Bugs Do

Good builders rarely catch the first sign of trouble at the level of syntax. They catch it earlier, at the level of direction.

A copy change grows teeth and asks for a new abstraction. A local fix reaches into seven files. A tiny feature suddenly needs a new dependency. The explanation sounds polished, but it slides right past your actual constraints.

That is execution smell.

I don't mean mysticism. I mean compressed pattern recognition.

The same smell has shown up for me every time I crossed domains. The medium changes. The first warning does not. It starts as drift, more side paths, more exceptions, more explanation than the job should need.

GitClear's 2025 report looked at 211 million changed lines from 2020 through 2024 and found more copy-pasted code alongside a drop in moved or refactored code.² That does not prove AI caused every bad change. It does match a familiar failure mode. Each individual function can look reasonable while the system gets fatter, patchier, and harder to reason about.

Picture a simple request: make the annual billing copy clearer on a pricing page.

The model comes back with a new constants file, a helper, an analytics event rename, a wrapper component, and a tidy comment about maintainability. Tests may still pass. The smell is that you just financed a future you never asked for.

This is why experienced builders often feel something off before they can fully explain it.

The smell comes first. The clean diagnosis arrives a minute later.

The Drift Check

What you need is a small framework for interrupting bad momentum early.

Call it the drift check. Run it while the AI is still working, not after the branch has already sprawled.

  1. Scope drift. Did the change grow wider than the request? A button fix became a refactor. A one-file task became a mini migration. Growth always asks to be paid for later.
  2. Abstraction drift. Did the model introduce a wrapper, helper, service, hook, or pattern your codebase did not need five minutes ago? AI loves symmetry. Your codebase has to live with consequences.
  3. Seam drift. Is a local change now pulling in distant files, glue code, or extra dependencies? Most long pain enters through seams.
  4. Story drift. Can the model explain, in plain language, why this approach fits your constraints and what tradeoff it chose? Once the explanation goes generic, judgment has already left the room.

This gives you places to interrupt without rereading the whole diff.

Ask for a plan before code. Cap diff size. Make the model name the invariant it is preserving. Ask what it decided not to change. Ask why a new dependency beats the primitives you already have. Ask for the whole task restated in one sentence before it touches anything.

These are not rituals. They keep smell close to execution.

And yet.

Once you get good at smelling drift, you can start interrupting for sport. Every wobble becomes a reason to step in. You feel discerning. The model turns from a genuine thinking partner into a nervous intern waiting for approval to breathe.

That is just micromanagement wearing a taste badge. The drift check only works if it stays small and consequential. Pressure points, not theater.

Review Pressure Points, Not Every Line

This does not mean skipping review.

It means reviewing like someone who understands where the danger actually lives.

GitHub's own guidance for AI-generated code starts with functional checks, then context and intent, then code quality, dependencies, and AI-specific pitfalls.³ That order matters because risk is unevenly distributed.

Some code deserves deep reading every time. Authentication. Billing. Permissions. Migrations. Concurrency. Deletion flows. Unfamiliar libraries. Anything with hard-to-reverse consequences.

If the counterargument in your head is, yes, but I still need to read every line on risky code, I agree. The point is not to stop being careful. The point is to stop spending the same level of care everywhere.

A lot of other code deserves strategic reading. File shape. Diff size. Tests. Error handling. Whether the new code matches surrounding style. Whether the model quietly created a second way to do something the codebase already knew how to do.

If I were sketching this on a whiteboard, I would still draw two loops.

In the first loop, AI writes, you read everything, you get tired, and the structural mistake still slips through because fatigue loves plausible code.

In the second loop, AI proposes, you catch drift early, you interrupt at pressure points, and you reserve deep reading for the places where consequences compound.

Only one of those loops preserves the point of using AI.

I care about this because my own temptation is not blind trust. It is over-control. When the model is productive, the easiest ego trap is feeling like the smartest person in the room because I can spot every wobble. Meanwhile the product ships slower.

Taste Moves Earlier in the Process

This shift matters because AI changes where taste shows up.

In the old workflow, taste often arrived after the fact. You wrote the code, stepped back, and judged whether it was elegant, coherent, and built for the right future.

With AI, taste has to arrive earlier.

While the shape is still soft. While the model can still be redirected. While you can still prevent a hundred lines of eager wrongness instead of cleaning them up afterward.

That changes who gets leverage. Typing speed matters less. API recall matters less. A live filter matters more. This is one place a broad background actually helps.

If you've spent years jumping between different kinds of work, you get suspicious of fixes that look clean in isolation. You've seen too many cases where the quick answer solved the immediate problem and made the overall system harder to hold in your head afterward.

That matters with AI because models are excellent at local improvements. The whole job is noticing when a local improvement is quietly making the codebase harder to reason about.

Prompt craft matters less than people want it to. Holding the product in your head matters more. So does understanding what kinds of complexity age badly, and noticing when AI is optimizing for completion instead of coherence.

That is why some builders get huge leverage from AI coding while others end up underwater in cleanup. The gap shows up in judgment timing.

Blind trust fails. Exhaustive reading fails more slowly and with better intentions. Live judgment scales further.

The builders who win here will not be the ones with infinite patience for reviewing machine output. They will be the ones who can hear the mix going muddy while the track is still playing.

That was the upgrade all along.

Rabbit Hole

If this line of thinking resonates, keep going with The Uber-Engineer Doesn't Write Code, on why AI increases the value of direction and system-level judgment, The One-Shot Illusion, on the quality tax that shows up when AI-built products ship on demo energy alone, and The Skill AI Can't Replace, on the part of the job that stays deeply human.


  1. METR's July 10, 2025 study followed 16 experienced open-source developers working on 246 real tasks in repositories they already knew well. Participants expected AI to help, yet the measured result in that setting was a 19% slowdown. This is useful evidence about one real workflow, not a universal verdict on every developer or codebase.
  2. GitClear's 2025 report analyzed 211 million changed lines from 2020 through 2024 and reported more duplicated code plus weaker moved or refactored signals over that period. This is observational, not a clean causal proof about AI, but it is a useful signal for the kind of structural drift many teams already feel.
  3. GitHub's documentation on reviewing AI-generated code recommends starting with functional checks, then verifying context and intent, assessing code quality, scrutinizing dependencies, and looking for AI-specific pitfalls. The sequence supports the larger point here: review depth should follow risk, not habit.