Loading course content...
Loading course content...
AI models like ChatGPT don't retrieve pre-written answers or compose responses all at once—they generate text incrementally through autoregressive generation, predicting one word (or token) at a time. When you submit a prompt, the AI reads your input, predicts the most likely first word based on learned patterns from billions of training examples, then uses your prompt plus that first word to predict the second word, and continues this sequential process until reaching a natural stopping point. This word-by-word approach explains why AI responses sometimes shift direction mid-answer (each word creates new context), why longer responses can lose coherence (compounding errors like a game of telephone), and why you can't interrupt to redirect (the AI treats previous words as fixed context). Understanding this sequential generation process reveals that at each moment, the AI faces thousands of possible next-word options and must evaluate which makes the most sense grammatically and contextually—for instance, after writing "The capital of France is...", it could continue with "Paris," "located," or countless alternatives. By the end of this lesson, you see the visible streaming of ChatGPT's responses for what they truly are: real-time prediction happening at that exact moment, not typing from memory. However, this understanding raises a new limitation: AI doesn't actually process complete "words" like humans do, but instead breaks text into smaller units called tokens, which fundamentally changes how it processes and generates language—a concept that will be explored next.
In the previous lesson, you learned that AI learns from patterns in massive datasets rather than following programmed rules. You now understand why larger models with more parameters generally perform better—but there's still a missing piece.
You know that ChatGPT, Claude, and Gemini learn by analyzing billions of examples of text. They've studied countless conversations, articles, books, and websites to recognize patterns in how language works.
But here's the problem: recognizing patterns is about understanding input. When you type a question into ChatGPT, it needs to produce an output—a response made of actual words and sentences.
Pattern recognition tells the AI "this is what spam looks like" or "this is how people typically respond to questions." But it doesn't explain how the AI transforms that knowledge into the specific words you see appearing on your screen, one after another.
Think about it: When you ask ChatGPT "What is photosynthesis?", how does it decide to start with "Photosynthesis is..." instead of "Plants use..." or "The process of..."? How does it know when to stop writing? How does it maintain coherent thoughts across multiple paragraphs?
The key insight is this: AI doesn't write complete responses all at once. It doesn't have a pre-written answer stored somewhere in its memory that it retrieves when you ask a question.
Instead, AI generates text word by word—technically, piece by piece, which we'll call "tokens" shortly. Each word it writes depends on all the words that came before it.
Here's what happens when you send a message to ChatGPT:
This process is called autoregressive generation—each new word is automatically generated based on what came before it.
Understanding that AI writes one word at a time explains several behaviors you've probably noticed:
Why AI sometimes changes direction mid-response: Since each word is predicted independently, the AI might start down one path and then shift as new words create new context.
Why longer responses can lose coherence: The further the AI gets from your original prompt, the more it's relying on its own generated words for context. Like a game of telephone, small imperfections can compound.
Why you can't interrupt and redirect: The AI has already committed to the words it's written. It can't go back and revise them because each new word is based on treating the previous words as fixed.
When you watch ChatGPT's response appearing word by word on your screen, you're actually seeing this prediction process happen in real-time. It's not typing out a pre-written answer—it's deciding what to write next at that exact moment.
Now that you know AI generates text one word at a time, a new question emerges: How does it actually make that prediction?
At any given moment, there are thousands of possible words that could come next. When the AI has written "The capital of France is...", it could continue with "Paris," "located," "a," "known," "called," or countless other options.
Some words make sense. Some don't. Some are grammatically correct but contextually wrong. The AI needs a method to evaluate all possibilities and choose the best one.
Understanding that AI writes word-by-word is the foundation, but it raises an important technical question: what exactly counts as a "word" for AI? You'll discover that AI doesn't actually work with complete words the way humans do—it breaks text into smaller pieces called tokens, and this changes everything about how it processes and generates language.
Please share your thoughts about the course.