define latency --plain-english
Latency
TLDR:You click something and there's a beat, a small hang, before anything happens.
You click something and there's a beat, a small hang, before anything happens. That pause has a name, and chasing it down is a huge part of what makes some apps feel instant and others feel sluggish. The pause is latency.
Latency is the delay between asking for something and starting to get it. Not how much you get, not how fast it flows once it starts, just the wait itself. (People sometimes mean the wait before the very first response, sometimes the whole round trip until it's done. Either way it's the wait, not the amount.) It's measured in milliseconds, and you feel it constantly even when you can't name it: the lag after you hit send, the spinner before a page paints, the half-second before an AI starts typing.
A long garden hose makes it concrete. Turn the tap and water doesn't appear at the far end instantly. There's a delay while it travels the length of the hose. Latency is that travel time, the gap before the first water shows up. It's separate from how fat the hose is (that's bandwidth, how much can flow at once). A fire hose with a long run still makes you wait for the first drop. You can have tons of bandwidth and still feel laggy, because the two are different problems.
Where the delay actually comes from, in plain terms: distance and work. Distance, because a request to a server on the other side of the planet physically takes longer than one nearby, signals move fast but not instantly. Work, because the thing answering might need a moment to do something, like a model running inference before it can reply. Add them up and that's your wait.
This is why two of these ideas exist almost entirely to fight latency. A CDN keeps copies of your site close to every visitor, so the "hose" is short instead of crossing an ocean. The edge runs code near your users for the same reason. Both are bending over backward to shrink that delay, because latency is what users feel as "this is slow," even when everything is technically working.
Why it's worth knowing as a non-coder: it gives you the right words when something feels off. "It's slow" is vague. "There's high latency before it responds, but once it starts it's fine" points straight at the problem, the wait, not the flow. And it sets honest expectations: some latency you can shrink (move things closer, do less work up front), and some you simply can't (an AI doing real reasoning will take a beat, and that beat is the work, not a bug).
Latency is the wait before the first drop, the travel time through the hose, not how much water comes out. It's the difference between an app that feels instant and one that feels like it's thinking. Shrinking it is half of what "make it fast" really means.