The Risks of Over-Relying on AI in Programming

Introduction

When the brain is no longer burdened, the technical skills begin to atrophy.

The phrase “natural language is the new programming language” has been embraced by many over the past year. The concept of “Vibe Coding,” popularized by former Tesla AI director Andrej Karpathy, has peaked in enthusiasm—suggesting that one need not understand syntax or implementation, but simply express needs to AI and check if the vibe feels right.

It seems that the barriers for programmers are being lowered.

However, last week, Anthropic, the company behind Claude—one of the most popular Vibe Coding models—threw cold water on this fervor. They published a rigorous paper titled “How AI Affects Skill Formation,” revealing a harsh truth: relying too heavily on AI while learning new things not only slows you down but can also lead to a significant degradation of core skills.

In fact, you might be turning into a “half-baked” engineer.

The Study

Anthropic’s researchers conducted a serious study involving over 50 experienced Python programmers in a closed-book exam. The task was to learn a little-known Python library, Trio, to complete a series of asynchronous programming tasks, simulating real-world scenarios where programmers are suddenly asked to use unfamiliar tools or frameworks.

The programmers were divided into two groups:

Manual Group: Allowed only to consult official documentation and Google, strictly prohibited from using AI.
AI Group: Equipped with a powerful AI assistant based on GPT-4o, capable of answering questions, writing code, and fixing bugs.

After completing the tasks, all participants took an exam designed to assess their learning outcomes, covering programming syntax, code logic understanding, reading ability, and debugging skills.

The initial assumption was that the AI group would outperform the manual group, given the assistance of a GPT-4o level tool. However, the results left everyone silent.

Results

The most striking outcome was that the AI group scored an average of 17% lower than the manual group. The paper specifically noted that the largest score gap was in debugging skills. This was not surprising, as the biggest drawback of Vibe Coding is that users often do not understand how the code runs, making troubleshooting impossible.

Many Vibe Coding enthusiasts might argue, “Okay, I admit I’m less skilled, but at least I’m faster!” Unfortunately, Anthropic’s data contradicts this claim. The total time taken to complete tasks showed no significant difference statistically: the AI group averaged 23 minutes, while the manual group averaged 24.7 minutes.

Why is this the case? The paper pointed out a neglected time cost: the “interaction tax.” Some programmers spent excessive time crafting prompts to get the AI to produce perfect code. Data showed that some even spent 11 minutes chatting with the AI, or in a 35-minute task, spent 30% of their time figuring out how to ask questions.

The Dangers of Vibe Coding

The AI group easily fell into a cycle of iterative debugging: AI generates code, errors occur, and they ask AI to fix them, leading to an endless loop of errors and fixes. This ultimately turns the project into an irreversible “spaghetti code” or a “black box” system, where the internal structure is unknown.

As time passed, programmers found themselves in a state of “waiting for results,” neither saving time nor learning anything.

You might be disenchanted with Vibe Coding by now, but the most intriguing part of the paper is that it categorized AI users into six types based on their interactions. While the AI group had lower average scores, the variance within the group was significant. Some users struggled, while others excelled. The difference lay in how they used AI.

User Profiles

The first category consists of low-performing users, dubbed “AI slackers,” who scored below 40% (failing). This category can be further divided into three subcategories.

The second category was more optimistic; despite using AI, their scores matched those of the manual group (65%-86%), as they found a symbiotic solution with the AI.

Why is there such a disparity among users of the same AI? Perhaps it is not that AI has diminished programmers’ skills, but rather that we succumb to the temptation of “taking the easy way out.”

Cognitive Offloading

Anthropic’s report touches on a psychological concept: cognitive offloading. When tools are powerful enough, we subconsciously offload tasks that require brain processing—like computation, memory, and logical reasoning—onto the tools, similar to how we might rely on autopilot.

In the AI era, we are offloading our “understanding” to large models. The paper uses the metaphor of AI as an “exoskeleton”—when you wear it, you feel immensely powerful, capable of lifting heavy weights. However, muscle growth requires resistance and strain; if you wear it too long without taking it off, your muscles will atrophy due to lack of stimulation.

The Illusion of Ease

The paper reveals a concerning statistic: error frequency. The manual group encountered an average of three errors per person, forcing them to stop, examine the red error messages, consult documentation, and think through issues like “why is there a type mismatch?” or “why didn’t the thread suspend?” The AI group, on the other hand, faced only one error on average, as the AI often provided code that ran smoothly.

This might sound like an advantage of AI, but Anthropic’s researchers argue that this is precisely the root of the problem. The paper states, “Encountering and independently solving errors is a crucial part of skill formation.” The manual group learned well because they experienced “friction”—each error presented a resistance that forced their brains to construct deep mental representations.

In contrast, the AI group’s experience was too “smooth.” The cost is that they lost their grip on reality: without the exoskeleton, they wouldn’t know how to walk.

This “smoothness” of AI is not limited to programming; it is spreading to various aspects of our lives. In programming, it eliminates the pain of debugging, misleading you into thinking you have mastered the system; in creative endeavors, it removes the tedium of brainstorming, making you believe you possess creativity; in interpersonal relationships, it even reduces friction.

Conclusion

The allure and danger of Vibe Coding lie in its creation of a “happy but ignorant” illusion. Participants in the study reported that tasks felt “easier” with AI, while the manual group found them difficult and painful. However, the reversal was stark: those who found tasks “easy” performed poorly in subsequent tests, while those who found them “difficult” reported a greater sense of learning and growth, scoring higher.

Thus, Vibe Coding may make you feel like a genius while coding, but when the code fails, you realize you are merely “blindly groping.” In the face of the “unknown,” AI treats everyone equally, rendering every lazy mind ineffective, regardless of its previous brilliance.

The study also indicates that even seasoned engineers with over seven years of experience scored lower when relying on AI in a new technical domain.

Anthropic’s paper serves not as a call to abandon AI, but as a survival guide for the AI era. To avoid being rendered ineffective by AI, we need to change our usage habits, learning from the “high-scoring” group in the report: ask “why” more, say “help me do” less; even when using AI-generated code, review it line by line as you would a colleague’s code; value debugging opportunities, and when encountering a bug, try to analyze it yourself for five minutes instead of sending a screenshot to ChatGPT after five seconds.

AI can indeed make us faster, but only if we know where we are going and how to fix the car when it breaks down. After all, when autopilot fails, only those who remember how to steer can save everyone in the vehicle.