Context window

Everything a model can see in a single call — input and output combined. Think of it as short-term working memory. Once the call is done, it's gone. The model doesn't remember anything between requests.

A 200k token window fits roughly 150,000 words. Sounds like a lot, but it fills up fast. System prompt, tool definitions, conversation history, injected documents — everything competes for the same space.

early in a conversation

system prompt + tools conversation available

after a long conversation

system prompt + tools recent messages (older ones were cut) model output

When the window fills up, the model doesn't gracefully forget. The app in front of it — ChatGPT, Claude, your own code — decides what to cut. Usually older messages get dropped or summarized before the next call. The model just sees whatever it's given and has no idea anything is missing.

Output has its own cap too. Even a 200k window typically limits output to something like 8–64k tokens. So you can't just hand the model a short prompt and expect it to write a novel in one go.

The bigger the context, the worse the accuracy. Researchers call this "context rot" — the model doesn't fail on any one position, it just gets less reliable overall as you stuff more in. Keep context focused. More is not always better. That's the whole point of context engineering.

← all terms