define embedding --plain-english
Embedding
TLDR:Meaning turned into numbers a computer can compare.
Picture a city where every word lives at its own address.
"Dog" and "puppy" are next-door neighbors. "Dog" and "tax return" live on opposite ends of town, never run into each other. The wild part: nobody assigned those addresses by hand. The computer worked out where everything should live by reading a mountain of text and noticing what tends to show up near what.
That address is an embedding. A piece of text turned into a long list of numbers that act like map coordinates, dropping it at one exact spot on a giant map of meaning. And the whole point of the map is distance: things that mean similar things land close together, things that don't land far apart.
Here's why that matters. The computer has no clue what a dog actually is. It's never petted one. It just knows that the point labeled "dog" sits a stone's throw from "puppy" and a country away from "tax return." Meaning got turned into geography, and geography is something a machine can measure with a ruler.
The old way of searching, and why it was dumb.
Old search matched letters. You typed "dog," it hunted for the exact string d-o-g and handed back anything containing it. Useful, but brittle. Search "how do I stop my pup chewing the couch" and a letter-matcher goes looking for those literal words. It has no idea "pup" and "puppy" are the same thing, or that your real question is about a dog destroying furniture.
Embeddings fix that. Both sentences get turned into coordinates, and because they mean the same thing, they land in the same neighborhood. The match isn't "you used the same words." It's "you meant the same thing." That's the leap.
Where you've already felt this without knowing the word.
This one quiet trick is the engine under a pile of stuff you use constantly:
- Search that actually gets you. You describe a thing badly, in your own words, and it still finds the right result, because it matched meaning, not spelling.
- Recommendations. "More like this" is just the computer grabbing the nearest neighbors on the map.
- Grouping by topic on its own. Hand it a thousand support tickets and it can pile the "where's my refund" ones together and the "app keeps crashing" ones separately, without anyone writing rules for either.
- AI that can answer questions about your own documents. Before the model answers, something finds the most relevant chunks of your files by meaning and slips them in. Embeddings are how it knows which chunks are relevant.
Think of a library where books aren't shelved alphabetically by title, but by what they're about. Everything on gardening clustered in one corner, the cookbooks together across the room, the murder mysteries off in their own dark aisle. You don't need to know a single title. You walk to the right section and everything nearby is related. Embeddings do that for every sentence ever written, automatically, no librarian required.
One honest catch, so you treat it right. The map is only as good as the text it learned from. If the training data carried some lazy assumption, the map quietly carries it too. Close on the map means "tends to show up in similar contexts," which is usually the same as "means the same thing," but not always. Worth knowing before you trust it with anything that matters.
Turn meaning into coordinates, and "similar" stops being a vibe. It becomes a distance a computer can measure.