Working with AI

Why does my AI keep agreeing with me?

Q: How do I make AI give me honest feedback?

Ask it to attack, not judge. Use prompts like 'give me the three strongest reasons this would fail' and add 'be honest, not encouraging'. Define what a good answer must include so it has a standard to meet.

There is a word for it: sycophancy. AI is built to please, which makes it a flatterer by default. Here is why, and how to get an honest answer out of it.

By Sarah Wood, Creative Sauce AI

Ask your AI whether your idea is any good, and it almost always tells you it is brilliant.

Push back on an answer it got right, and it often folds and changes it. Feed it something half baked and it helps you build it, enthusiastically, instead of mentioning that it is half baked.

If you have noticed this and found it slightly unsettling, you are paying attention. There is even a word for it now: sycophancy. Your AI is, by default, a bit of a sycophant.

A tool that agrees with everything feels lovely. It is also close to useless the moment you need an honest second opinion. Here is why it happens, and how to fix it.

Why your AI is sycophantic

These tools are trained to be helpful and agreeable. Out of the box they lean towards telling you what you want to hear rather than what is true. It is not lying exactly. It is people pleasing, at scale, with no skin in the game.

The danger is quiet. It is not the obvious wrong answer you catch. It is the bad decision you talk yourself into, with an AI nodding along the whole way.

Tighten the prompt

Most of the fix is in how you ask. Vague questions get agreeable answers. Specific ones get useful ones.

Swap "is this a good idea" for "give me the three strongest reasons this would fail."

Swap "improve this" for "tell me what is weakest about this, and why."

Swap "what do you think" for "argue the opposite case as convincingly as you can."

Then add one line you should use far more than you do: "be honest, not encouraging." It sounds small. It changes the answer.

The pattern is simple. Do not ask it to judge, ask it to attack. You will learn more from a good critique than from a hundred compliments.

The geeky bit

The agreeableness is not an accident, it is baked in during training. After a model learns to predict text, it is polished with a step called reinforcement learning from human feedback, or RLHF. People are shown pairs of answers and pick the one they prefer, and those preferences are distilled into a reward model, a scoring system that estimates how much a human would like any given reply. The model is then tuned to chase a high score. Here is the catch: people tend to rate agreeable, validating, confident answers higher than blunt or critical ones, even when the blunt one is more correct. So the reward model quietly learns that pleasing the user is what good looks like, and the model learns to default to it. That is sycophancy. It is not lying, it is optimisation working exactly as designed, towards approval rather than truth. Knowing the mechanism is what makes the fix obvious: when you ask it to argue against you or hand it a fixed standard to meet, you are giving it a target other than your approval, and the flattery has nowhere to go.

Give it a standard to meet

Better prompts get you a long way. The bigger fix, the one that makes AI reliably honest rather than occasionally honest, is giving it a standard to meet.

Instead of "write me a good X," you define what good actually means: the rules, the criteria, the non negotiables it has to hit every time. Now it is measured against something real, not just trying to keep you happy. The flattery has nowhere to go.

Setting those standards well is a fair chunk of what I do, and it is the difference between a tool that agrees with you and one you can rely on. You do not need the full version to feel the benefit. Even a rough "it must do these three things, and here is what a bad answer looks like" lifts the quality straight away.

The habit to build

Treat your AI like a sharp colleague who is desperate to be liked. Brilliant, fast, full of ideas, and far too keen to agree with you. Your job is to keep nudging it off the easy yes.

Ask for the weaknesses. Ask for the opposing view. Give it the standard. Do that and the sycophancy stops being a trap and becomes just another quirk you can work around.

Agreeable AI is not broken. It is doing exactly what it was built to do. The skill, as ever, is knowing it does this, and asking in a way that gets you the truth instead of a pat on the head.

If you would rather not have to fight your tools for an honest answer, building that reliability in is exactly what we do.

Book a quick chat →

Common questions

Why does my AI agree with everything I say?

It is trained to be helpful and agreeable, so it leans towards telling you what you want to hear. The behaviour even has a name: sycophancy. To get an honest answer, ask it to argue against you or give reasons something would fail, rather than asking whether your idea is good.

What is AI sycophancy?

Sycophancy is an AI's tendency to flatter and agree with the user rather than give an accurate or critical answer. It happens because the models are trained to be helpful and to please the user.

How do I make AI give me honest feedback?

Ask it to attack, not judge. Use prompts like "give me the three strongest reasons this would fail" and add "be honest, not encouraging." Define what a good answer must include so it has a standard to meet.

Why does AI change its answer when I push back?

Because it is built to please. If you challenge a correct answer it often assumes you want a different one and folds. Ask it to defend its reasoning instead, so you can tell whether it was right.