define prompt-caching --plain-english
Prompt Caching
TLDR:Pay once to read it, reuse it cheaply.
Watch a chef during the dinner rush. The order comes in and they don't go hunting for an onion. It's already diced, the stock's already simmering, the sauces are sitting in little containers ready to go. That prep has a name: mise en place.
Caching is the AI's mise en place.
Here's the problem it solves. The AI re-reads its entire room of context every single turn. Your long instructions, the giant document you pasted, the setup at the top. Every time you hit send, it reads all of it again, top to bottom, like it's the first time. That's slow, and if you're paying per chunk of text, it's expensive. You're paying to re-chop the same onion on every turn.
Caching preps the stuff that doesn't change and holds it ready.
The parts of your conversation that stay the same get processed once and set aside. Next turn, instead of reading all of it from scratch, the AI grabs the prep that's already done and only deals with what's new (your latest message). It doesn't re-chop the onion. It reaches for the bowl where the diced onion is already waiting.
You feel this in two ways:
Replies come back faster. Less to re-read means less time staring at the little thinking dots.
It's cheaper. If you're using AI through an API, reused prep costs a fraction of fresh text. Roughly a tenth of the price for the cached part.
The catch: the prep doesn't sit out forever. Walk away for a while and the kitchen tosses it. Come back later and the onion gets chopped fresh again, at full price. (The prep usually stays good for a few minutes of inactivity, so a fast back-and-forth keeps reusing it.)
The lesson if you're building anything: put the big, stable material at the top and keep it steady.
- The long instructions, the reference document, the rules: stack those at the top and don't touch them.
- The thing that changes (the actual question) goes at the bottom.
Why does the order matter? The prep only holds as long as nothing above it moves. Change one word near the top and the AI has to re-chop everything below it, because the onion underneath isn't the same onion anymore. Keep the steady stuff steady and you keep the prep.
Most apps do this for you quietly, so you never have to think about it. Everything you leave unchanged is prep you don't pay to redo.