Working with AI

I design systems, not prompts

There is a job that does not really have a name yet, and it is the one I do. Not prompts. Not AI strategy. Systems. The model is maybe a tenth of it. The design around it is the rest.

You have probably tried to fix an AI tool by writing a better prompt. Cleverer wording, a few more rules, an example or two pasted in. It works for a minute. Then the same thing wobbles again the next day.

It is not your prompt. The prompt was never going to carry it on its own.

I work alongside these tools every day, and the part that has stuck with me is how little of a working build is the prompt. The clever sentence is the easy bit. The hard bit sits all around it.

A prompt is a sentence. A system is a decision.

A prompt asks the model to do something once. A system decides where the work actually lives, what a person still touches, and what happens on the morning it goes wrong.

Those are different jobs. One is writing. The other is closer to plumbing, or architecture, or just plain judgement applied early and held firmly.

When someone says AI did not work for them, they almost always mean the prompt did not hold. The system was never built. There was nothing for the clever sentence to lean on.

The model is about a tenth of it

This is the bit nobody seems to be posting about. The model you pick matters far less than people think. Swap one for another and most of your problems are still sitting there waiting for you.

The other nine tenths is design. Where does the data come from. What does the AI never get to decide on its own. What does it do when it is unsure rather than guessing. Who sees the output before a customer does. Where is the off switch.

Get those right and an average model behaves. Get them wrong and the best model on the market still embarrasses you on a Monday.

What the other nine tenths actually looks like

Let me make it concrete, because "the design around the model" stays vague until you see it. Say you want AI to answer your customers' questions. The model writing the reply is the tenth. Here is the rest.

It has to answer from your real prices, your real policies, your real availability, not a confident guess it invented. It needs a hard line it will never cross, like quoting a number it is not sure of. It needs to know which questions it handles itself and which get quietly handed to a person. It needs something sensible to do when it does not know, instead of making something up to look helpful. It needs a home, your site, your inbox, wherever your customers already are. And it needs a plan for the moment it gets something wrong, because one day it will.

None of that is the model. All of it is decisions, made in advance, before a single customer ever sees the thing. That is the design.

And the shape barely changes from job to job. Swap "answer customer questions" for drafting quotes, sorting an inbox, or turning a pile of data into a decision, and you are asking the same questions every time. Where does it get its facts. What is it never allowed to decide alone. Who checks it. What happens on the bad day. The model does the visible bit. The system is everything that makes the visible bit safe to rely on.

The geeky bit

A reliable build is less about the model and more about the scaffolding around it. A system prompt fixes the rules and the role so the behaviour does not drift between sessions. Retrieval, often called retrieval augmented generation or RAG, lets it answer from your own documents rather than its general memory, so it is grounded in your facts. A validation layer checks every output against your non negotiables and rejects anything that breaks one before it reaches a person. Decomposition splits a big ask into smaller checked steps that hold together far better than one giant generation. And a clear human checkpoint keeps a person on the one decision that genuinely needs judgement. The model sits in the middle of all that. It is the smallest part of the work, not the largest.

What happens when it goes wrong is the design

Anyone can build the version that works in the demo. The demo always works. That is what a demo is for.

The real design question is the unglamorous one. What happens on the bad day. When the input is strange, when the model is unsure, when two of your rules quietly disagree. A good system has an answer ready for all of that. A prompt just hopes.

That answer is the job. Deciding it in advance, in cold blood, is most of what I actually do.

Why I keep saying it this way

I call it designing systems because the words we usually reach for, prompting, AI strategy, do not describe it. One is too small. The other is too vague to build from.

The honest description is closer to this. I decide where the work lives, what the human keeps, and what the thing does when it fails. The model helps. It does not decide. And the part nobody photographs, the design around it, is the part that decides whether any of this holds up at nine in the morning when it actually matters.

If your AI works in the demo and wobbles in real life, the fix is usually the system around it, not a better prompt. That is the part we design.

Book a quick chat →

Related: What a year living inside these tools actually taught me.

Common questions

What is the difference between a prompt and an AI system?

A prompt is a single instruction to the model. A system is the design around it: where the data comes from, what a human still touches, what the AI is never allowed to decide alone, and what happens when something goes wrong. The prompt is a small part of a working build. The system is the rest.

Why does my AI work in a demo but not in real use?

Because the demo only ever shows the easy path. Real use brings strange inputs, edge cases and bad days that a one off prompt was never built to handle. A proper system has an answer ready for those before they happen, which is why it holds up where a prompt drifts.

Does the choice of AI model matter most?

Less than people expect. The model is roughly a tenth of a reliable build. Swapping one model for another rarely fixes the real problems, because those live in the design around it: grounding, validation, human checkpoints and a clear plan for failure.

What does designing AI systems actually involve?

Deciding where the work lives, what the human keeps, how the AI stays grounded in your own facts, how every output gets checked, and exactly what happens on the bad day. It is closer to architecture and judgement than to writing clever sentences.