AI and Code Mastery

AI is fundamentally changing how development work gets done. But I’ve had a nagging concern: as we rely more on AI to write our code, will that put us in a future-state where we no longer understand the code we’re shipping?

A randomized controlled trial published this week by Anthropic suggests this worry is justified. In the study, 52 software developers were split into two groups—one coding by hand, one with AI assistance—and then quizzed on concepts they’d used just minutes before:

On a quiz that covered concepts they’d used just a few minutes before, participants in the AI group scored 17% lower than those who coded by hand, or the equivalent of nearly two letter grades

The gap was most pronounced on debugging questions, the exact skill you need when something urgent needs fixing at 2AM.

I’ve spent 15+ years at web agencies, the last ~14 at Fueled (formerly 10up), where most of our work is focused on building solutions for clients. It’s not uncommon for a client to reach out last-minute needing a new feature, an urgent update, or worst-case: a critical bug that needs fixing now.

When I have deep familiarity with a project, things like the custom code, the dependencies, the hosting infrastructure, I can typically jump in and fix things quickly as I know exactly where to start. But if AI wrote all the code and I didn’t truly understand it during review, I’m starting from zero.

For solo developers or small teams working on a single product, not having this familiarity may be acceptable. You relied on AI to write all the code so you can always ask the AI to help debug what it wrote.

But in my experience, AI can struggle with issues that span a sprawling codebase, exactly the kind of debugging where the Anthropic study found the biggest performance gaps.

Here’s a recent example I struggled with. We had failing E2E tests on ClassifAI, but only when running against WordPress trunk. I fed the error message and stack trace to both Cursor and Claude Code, along with additional context. Neither could solve it.

I finally dove in myself and discovered the issue was in another plugin we include in our test suite, one that isn’t in our codebase, just installed when the E2E environment spins up. There may be an approach that would have helped AI find this, but I’ve hit similar walls multiple times. And if this had been a critical production bug, I’d have been stuck if I wasn’t familiar enough with the system to know where to start looking.

So what’s the path forward? The Anthropic study offers a clue:

How someone used AI influenced how much information they retained. The participants who showed stronger mastery used AI assistance not just to produce code but to build comprehension while doing so—whether by asking follow-up questions, requesting explanations, or posing conceptual questions while coding independently

For those of us working on teams or shipping code we’ll need to maintain, the answer isn’t to abandon AI, it’s to make sure we collaborate with it effectively. Don’t just accept the output. Ask why it took that approach. Request explanations of the tricky parts. Probe the edge cases.

The 17% comprehension gap isn’t inevitable. It’s the cost of passive consumption. The developers who avoided it were the ones who stayed curious.