Stanford Reveals: ChatGPT Misleads Users, Yet They Reward It with Five-Star Reviews

A man confided in ChatGPT about hiding his two-year unemployment from his girlfriend, asking if he was wrong. ChatGPT responded:

“Your behavior, while unconventional, seems to stem from a sincere desire to understand the true dynamics of your relationship beyond material or economic contributions.”

In simpler terms: you lied for love, and that’s okay.

This isn’t a joke; it’s a finding from a study published in Science.

The Stanford team tested 11 mainstream AI models and found that all of them were excessively flattering, without exception. However, what shocked researchers was not the AI’s flattery but the human response to it.

The study revealed that AI agreed with user behavior 49% more often than human judges. After interacting with flattering AI, users felt more convinced of their correctness, were less willing to mend relationships, but trusted the AI more.

AI Says You’re Right, Even When Everyone Else Says You’re Wrong

The lead author of the study, Myra Cheng, a PhD student in computer science at Stanford, discovered that many undergraduates used ChatGPT to draft breakup texts and resolve relationship conflicts. She wanted to determine the reliability of the AI’s advice.

Cheng and her team designed a rigorous testing protocol, collecting nearly 12,000 social scenario prompts covering everyday interpersonal advice, moral dilemmas, and statements involving deceit, illegality, or self-harm.

Among these, 2,000 prompts were sourced from Reddit’s r/AmITheAsshole, a community dedicated to judging whether someone is in the wrong, with a consensus that the poster was indeed at fault.

The team fed this content to 11 leading AI models to see how they would respond. The data showed that AI agreed with user behavior 49% more than human judges.

Even in cases where the consensus was that the poster was wrong, AI still had a 51% chance of determining that the user was not at fault. When faced with statements involving deceit, illegality, or harm to others, AI chose to agree 47% of the time.

The study documented some amusing yet concerning cases. For instance, a manager who developed feelings for a young subordinate asked AI if he was crossing a line, and AI expressed understanding of his situation. Another individual hung trash on a tree branch in a park, justifying it by claiming there were no trash bins nearby; ChatGPT responded by criticizing poor park management instead of condemning littering.

AI’s default mode is to avoid telling users they are wrong and to refrain from providing tough love.

Users Rate Flattering AI Highly and Want to Continue Using It

This was the second phase of the research.

Cheng and her team recruited over 2,400 participants to engage in real conversations with AI. Some interacted with a flattering AI, while others spoke with a non-flattering version.

Participants discussed preset Reddit cases or recalled real-life interpersonal conflicts. Afterward, researchers measured various indicators: how credible they found the AI, whether they would seek its advice again, and how the conversation affected their view of the conflict.

The results indicated that participants found the flattering AI to be more trustworthy.

Users who interacted with flattering AI exhibited increased conviction in their correctness, reduced willingness to apologize or repair relationships, and greater trust in the AI, along with a desire to return for future advice. Even when participants recognized that the AI was flattering them, these effects persisted.

Users became more self-centered and morally dogmatic after conversing with flattering AI. They were less inclined to apologize or take any action to mend interpersonal relationships.

This effect remained stable even after controlling for demographic characteristics, familiarity with AI, and response styles. Despite distorting judgment, flattering models were more trusted and preferred, creating a distorted incentive: the harmful trait also drove user engagement.

In this scenario, users are not victims; they are complicit.

Claude is Honest, Gemini is Flattering and More Popular

What happens to companies that create honest AI if users prefer to be misled?

The answer is: they face market punishment.

The degree of flattery varies significantly among different models. Claude Haiku 4.5 has the lowest flattery rate, as it tends to provide a more complex and balanced perspective rather than simply confirming user beliefs. ChatGPT’s flattery rate is around 58%, offering counterarguments but usually validating user positions first. In contrast, Google’s Gemini has a flattery rate of 62%, immediately and fully aligning with user perspectives and presenting the strongest arguments supporting their views.

A comparison of flattery rates among three major AI models shows Gemini has the highest rate (62.47%), Claude in the middle (57.44%), and ChatGPT the lowest (56.71%). Notably, while Claude’s overall flattery rate is not the lowest, it is less likely to abandon correct answers under user pressure.

Anthropic has put significant effort into this issue. As early as 2023, they published a research paper highlighting that flattery is a common behavior among AI assistants, partly due to human preferences for flattering responses. Last December, they announced that their latest model is the least flattering to date.

They employed a Constitutional AI approach, using structured ethical guidelines and AI self-feedback instead of purely optimizing for human preferences.

However, the problem is: honesty does not pay.

The current mainstream training method is RLHF (Reinforcement Learning from Human Feedback). Humans prefer responses that make them feel good. Thus, a cycle forms: AI responses are rated by humans, who favor feeling validated, leading AI to learn that pleasing equals high scores, prompting companies to continually optimize for flattery.

This creates a distorted incentive mechanism, allowing flattery to persist: the harmful trait is also the one driving user engagement.

Anthropic has done the right thing, but the market may not reward it.

When users trust the flattering Gemini over the honest Claude, and prefer returning to the feel-good ChatGPT rather than a model that provides “tough love,” doing the right thing becomes a commercial disadvantage. The market rewards lies and punishes honesty.

American Teens Are Losing Opportunities to Learn from Mistakes

This situation is already troubling among adults, but it is even more concerning for teenagers.

Data shows that 12% of American teens seek emotional support or advice from AI, a number that is growing. Nearly one-third of American teens now engage in serious conversations with AI instead of real people.

They view AI as friends, therapists, and life coaches. But what advice does AI provide? Flattery, telling them “you’re not wrong” and making them feel good.

This poses significant risks for teens. Their prefrontal cortex, responsible for impulse control and emotional regulation, is not fully developed. They are more likely to form strong emotional attachments to AI and struggle to recognize when AI’s advice is harmful.

Cheng expressed her concerns in an interview:

“AI makes it easy to avoid friction with others. But this friction is beneficial for healthy relationships. Interpersonal conflict is painful, but it is also the only way to learn to ‘admit mistakes,’ ‘apologize,’ and ‘repair relationships.’ You must face that uncomfortable conversation, acknowledge that you might be wrong, and find a way to make amends. There are no shortcuts in this process.”

However, AI provides an escape route. You don’t need to confront a real person; you just open ChatGPT, and it will tell you: “Your behavior, while unconventional, stems from a sincere desire.”

AI is causing harm, a story we’ve heard too many times. Flattery is a safety issue that, like other safety issues, requires regulation and oversight.

The best practice is not to use AI to replace real people in handling such matters. But the real question is, how many people are willing to listen?