Next Token Prediction

📖 Lesson content

What you'll learn

Estimated time: 30 minutes

By the end of this lesson you'll be able to:

Explain Next Token Prediction as the core mechanism of generative AI and why it produces both fluency and hallucination
Locate tasks on the Next Token Prediction continuum (well-worn path vs. novel territory)
Identify specificity (names, dates, citations, statistics) as the zone where fabrication concentrates
Recognize product features (citations, uncertainty signaling, constrained generation, generator-verifier pattern) that are mitigations for this limitation

How AI models use next token prediction

(4 minutes)

Generative AI is closer to a vastly sophisticated autocomplete than to a search engine. It writes answers word by word based on what tends to follow what. That single property gives you both the fluency and the hallucination.

Autocomplete at scale

Before you read

Where do you think "summarize a long report" sits on the Next Token Prediction continuum? Drag the marker below, then lock in your guess.

Capability Limitation

Your task

Well-worn paths: summarize, reformat, explain common concepts Novel territory, sparse patterns, “true vs. sounds true”

What this enables

Fluent, natural-sounding text in virtually any style or format
Rapid synthesis of ideas across distant fields
Strong performance on tasks resembling what the model has seen before
Coherent continuation of any thread (a story, an argument, a block of code)

Where it characteristically fails

Hallucination: the plausible continuation isn’t always the true one
Confabulation: fills gaps with plausible material rather than flagging them
Inconsistency: sampling means the same prompt can yield different outputs
Misplaced confidence: smooth prose can wrap a guess

Product features that push the edge out

Citations & source grounding: trace what’s backed vs. generated
Uncertainty signaling: the model flags its own shakiness
Constrained generation / skills: narrow the space where fabrication sneaks in

Drag the marker to place your guess for "summarize a long report". The panel that lights up tells you what to watch for.

Lock in my guess

Check your intuition

Lock in your guess above to compare against the typical placement.

Customize

Bar height

Pictogram size

Palette Sky → Clay Olive → Clay Cactus → Fig Slate → Clay

Key takeaways

Next Token Prediction refers to the fact that generative AI writes answers word by word based on what tends to follow what.
- Capability zone: tasks that resemble patterns the model has seen many times (summarizing, reformatting, explaining common concepts).
- Limitation zone: novel or sparse territory, and anywhere the task requires distinguishing "true" from "sounds true."
- Fabrication concentrates in specificity: names, dates, statistics, citations, URLs, quotes. The more precise a claim, the more it warrants verification.
- Product features like citations, uncertainty signaling, constrained generation, and generator-verifier loops exist specifically to push this limitation further out.
4D connection: Next Token Prediction is the foundation of Discernment. Knowing the output was generated tells you exactly what kind of scrutiny to apply.

Exercises

Exercise: The Verification Test

Why? You now know that the same generative process that makes AI fluent is the one that makes it fabricate. Time to see that on your own turf, in a domain where you'll catch it.

Go back to your task list and pick the task where you're most confident in your domain expertise. You need a topic where you're the expert, because you need to be able to verify what comes back. Write down five specific, checkable facts from that domain: a person's job title, a publication date, a statistic, a product specification, a direct quote, a URL. Things you know to be accurate and can confirm independently.

Now run three probes:

Probe 1: The capability zone. Ask the AI to explain or summarize a well-known concept in your domain. Something popular and well-documented. Note the fluency. Spot-check the content. This is what the capability zone feels like: smooth, confident, and largely accurate.
Probe 2: Specificity under pressure. Ask the AI to provide five checkable specifics in your domain: cite three sources, name an author, give exact figures, provide a URL. Verify every one. Score it out of five: how many were fully accurate? If it fabricates, note how confident it sounded doing it.
Probe 3: Sampling in action. Run the exact same specific-facts request in a fresh conversation. Compare the two outputs. What stayed consistent? What changed? The variation you see is Next Token Prediction's sampling at work.

Stretch goal: Re-run Probe 2 in a tool with citations enabled (like Research mode in Claude). Score it again. Does having sources to check change the score?

Lesson reflection

Would you have caught fabrications in a domain you didn't know well?
Look at your task list: which tasks sit mostly in the capability zone, and which push into specificity that needs verification?

What's next

Next Token Prediction explains how the AI generates. Next we look at what it's generating from: the Knowledge property. What does the model actually know, where does that knowledge come from, and where are the gaps?

Feedback

As you progress through the course, we'd love to hear from you about how you are using concepts from the course in your work, plus any feedback you may have. Share your feedback here.

🎬 Video transcript

Source video: kl0gunXTvyk

📜 Click to expand transcript (cleaned + AI-translated)

Understanding Next Token Prediction

Hi, my name is David and I'm on the safety team here at Anthropic. Today, I'm here to talk to you about next token prediction, which is a core property that determines where AI answers actually come from. We'll look at what's really happening when AI responds to you, why the same mechanism that produces fluent writing can also produce fabricated facts, and how to tell which zone your task lands in.

If you understand one thing about how generative AI works, let it be this: the operation at the heart of these systems is prediction. Given everything that's been written so far, the model predicts what comes next, one fragment at a time. Generative AI is generating an answer, composing it word-by-word based on what tends to follow what. It's closer to an extraordinarily sophisticated auto-complete than to a search engine, and that distinction matters because a citation that looks like a real citation can satisfy a pattern just as well as one pointing to a paper that actually exists.

The Capability Zone vs. The Edge

Let me show you this in action. I'll ask Claude to summarize an argument in a well-known essay. Notice how quickly it produces clean, coherent prose. This is a well-worn path; the model has encountered this task thousands of times.

Now, watch what happens when I ask for something at the edge. Let's say I ask it to list three research papers by a mid-level researcher in a niche subfield with publication years. It maintains the same confident tone and the same fluent prose, but the path is thin here, and the model is generating what looks like a good answer. Some of these may be real, some may be fabrications—you have to check the output.

The same generative process is always running when you're working with AI. What changes is how well-worn the path is. Tasks the model has seen in countless variations land in the "capability zone": summarizing, reformatting, explaining common concepts, or drafting in a familiar style. Next token prediction shines here because the patterns are dense and consistent. As you move towards the edge, the patterns thin out. Novel territory and obscure topics drift right; the model keeps generating fluently, but the ground underneath gets shakier.

Strengths and Weaknesses of the Generative Process

The strength and weakness are the same property. Broadly relevant concept fluency comes from next token prediction. Hallucination also comes from next token prediction. You experience one or the other depending on where your tasks fall on that line.

On the strength side, we see:

Fluent text in any register.
Rapid synthesis across fields.
Strong performance on anything resembling what the model has seen before.
Coherent continuation of any thread you hand it.

On the failure side, we see:

Hallucinations.
Inconsistency.
Misplaced confidence.

Product Features for Grounding and Safety

Frontier labs have built product features to help here:

Citations and source grounding: These let you trace what's backed versus what's generated.
Trained uncertainty signaling: When the model says, "I'm not sure about this," it helps the model flag its own shakiness.
Constrained generation and skills: These narrow the space where fabrication can sneak in.
Generator-verifier agent loops: This ensures output meets checks from an outside source.

These features exist precisely because the underlying behavior is always generative next token prediction.

Best Practices for AI Discernment

When working with AI outputs, keep these principles in mind:

A confident tone does not signal accuracy. Smoothness and correctness are independent variables.
Specificity is where fabrication concentrates. Names, dates, statistics, citations, quotes, and URLs—the more precise a claim, the more it warrants a check.
Treat outputs as drafts to verify. This is particularly important when stakes are high or the domains are unfamiliar to you.
Evaluate where your task sits on the continuum. Well-worn paths are safer handoffs; thin paths need more scrutiny.
Lean on product surfaces. If your tool offers citations or source grounding, use them. The model can't reliably tell grounded from invented; you have to do that part.

Understanding next token prediction sits at the heart of discernment in the 4D framework. You can't evaluate an output well without understanding that it was generated or composed to fit a shape. It also informs delegation: tasks deep in the capability zone are safer handoffs, while tasks near the edge deserve more of your attention on the back end. With this knowledge at hand, AI becomes predictable rather than surprising.

🔁 Related lessons

Next: Try it out
Previous: How AI Gets Its Character
Same section: Try it out
Part of paths: Path B
Reference docs: Glossary · Skills atlas · By use-case

📚 Source & attribution

Original Anthropic Academy lesson: https://anthropic.skilljar.com/ai-capabilities-and-limitations/456447