Skip to main content
Explainers · The Context Window
A Visual Primer

What the model
actually sees.

Not your whole matter. Not last week’s chat. A fixed window of text — and nothing else. Seven stages.

Scroll
01

A window, not a memory.

last week's chatyour other mattersfiles on your machinewhat it said yesterday
Everything else — invisible
system instructionsyour messagesdocumentsits own replies

The model has no memory and no filing system. Each time you press send, it is handed one continuous run of text — the context window — and predicts what comes next. Anything outside that window does not exist for the model. Windows are measured in tokens — the word-fragments the model reads instead of whole words.

02

Everything shares the same space.

instructions
documents
conversation
model's replies
instructionsdocumentsconversationmodel's replies

Your instructions, every document you paste, the whole conversation so far, and every answer the model has already given all sit in the same window, competing for room. The window is shared — a long document squeezes everything else.

03

Watch it overflow.

7k / 32k tokens in the window
Instructions to counsel3k
Chronology4k

Each block is sized by its share of the window. Keep adding — once the total passes 32k, the oldest material falls out of the top, starting with your instructions.

04

Why long chats drift.

Message 1 — your original instruction“Throughout this matter, use British English, refer to the Claimant — never the Plaintiff — and flag anything you cannot verify.”
Message 22“Good — now summarise the Respondent’s position on the second ground.”
Message 40“Draft the covering note for my instructing solicitors.”

When a conversation outgrows the window, the earliest turns are dropped or summarised. The careful instruction you gave in message one may literally no longer be in front of the model by message forty. It hasn’t “forgotten” like a person — the text is simply absent. One matter per chat; restate what matters.

05

Attention thins out.

startposition in the windowend

recall by position — the middle sags

Even inside the window, attention isn’t uniform. Material at the start and end of a long context tends to get more attention than material buried in the middle. Put the thing that matters most at the start or the end — and say it twice if it’s critical.

06

Four practical rules.

1
One matter per chat
Start a fresh conversation for each new matter rather than pivoting an old thread. Everything in the thread shares the window — the old matter's material crowds out the new one.
New matter → new chat. Never re-use Tuesday's thread for Thursday's case.
2
Restate the constraint that matters
Don't rely on message one surviving to message forty. If a constraint is critical, repeat it in the message where you need it applied.
“As before: British English, Claimant not Plaintiff, no advice on quantum.”
3
Front-load or end-load the critical material
Attention is strongest at the start and end of the window. Put the document that matters most first, and the question that matters most last.
Key witness statement first · your actual question last.
4
Batch your questions
Three points in one message beats three follow-ups. Each follow-up adds another turn to the window and pushes the source material further from your question.
“Three points, one message: limitation, remedy, costs.”
07

A million tokens is a bundle, not a library.

A letter
≈ 1k
A witness statement
≈ 7k
A skeleton argument
≈ 10k
A hearing bundle
≈ 500k–1M

Modern windows reach around a million tokens — roughly a full hearing bundle, or several long novels. But a bigger window is not a guarantee of attention: the middle still sags (stage 05), and models can still be selective about what they use.

Now you know what it can’t see.

The window explains half of all disappointing AI answers. The other half is what the model does with the space it has — and why it sometimes fills gaps with invention.

Next: Why AI Makes Things Up →