TOEIC Link Listening — Question-Stem Preview and Answer Prediction: How a 12-Second Preview Window Lifts the Listening Band from 16 to 25

Every TOEIC Link listening item is preceded by a preview window of roughly 8 to 14 seconds during which the question stem and the four answer choices are visible on screen but no audio is playing. Internal practice-corpus data shows that band-16 candidates use this preview window in roughly 22% of items, while band-25 candidates use it in roughly 94% of items, and the gap between the two bands on items where the preview window is fully used is more than seventeen percentage points. The preview window is the single most under-used resource in the listening module, and the candidate who installs a disciplined preview protocol converts the gap into the largest single source of band-range delta.

This guide separates the preview window into a four-step protocol, lists the four prediction patterns the candidate must recognize, and outlines a six-week routine that integrates the preview discipline with the rest of the listening-module preparation. For broader context on listening-module preparation, see the listening strategies by question type guide and the listening module overview.

The four-step preview protocol

The candidate runs the preview protocol every time the question stem and answer choices appear on screen and the audio has not yet started. The protocol takes between 8 and 12 seconds and produces three concrete outputs that the candidate uses while the audio plays.

Step 1 — Read the question stem and classify the question type

The first move is to read the question stem in full and assign the item to one of the six TOEIC Link listening question types: gist, detail, inference, purpose, speaker-attitude, or future-action. The question type determines which features of the audio the candidate must track. A detail item demands tracking a specific number, name, or time expression; a gist item demands tracking the overall topic; an inference item demands tracking the speaker's implicit position; a purpose item demands tracking why the speaker is speaking; a speaker-attitude item demands tracking tone and modal verbs; a future-action item demands tracking the final third of the audio where the next step is usually stated. Classifying the question type before the audio begins primes the candidate's attention toward the right features and prevents wasted attention on irrelevant material.

Step 2 — Scan the four answer choices and identify the discrimination axis

The second move is to scan the four answer choices in roughly two seconds and identify the discrimination axis — the dimension along which the four choices differ. On detail items the discrimination axis is usually a number, a name, or a location; on inference items it is usually a position or a motivation; on purpose items it is usually one of four plausible reasons. The candidate who identifies the discrimination axis before the audio begins knows exactly which feature to track and ignores the rest of the audio.

Step 3 — Eliminate one obviously wrong choice

The third move is to eliminate one answer choice that is obviously inconsistent with the question stem or that contains a trap-keyword pattern (a word from the audio used in a way the question stem does not support, an extreme modifier like "always" or "never" on what is clearly a moderated topic, a wrong-tense formation that mismatches the question stem's temporal frame). Eliminating one choice before the audio begins compresses the four-option problem into a three-option problem and reduces decision latency once the audio finishes.

Step 4 — Form a one-sentence prediction of the expected answer

The fourth move is to form a one-sentence internal prediction of what the answer is likely to be, based on the question stem and the surviving three choices. The prediction is rarely correct verbatim, but it primes recognition: when the audio plays and the matching phrase appears, the candidate recognizes it immediately rather than parsing it for the first time. Prediction is the single largest accuracy multiplier on inference and speaker-attitude items, where the audio rarely uses the question stem's vocabulary and the candidate must map paraphrase to paraphrase.

The four prediction patterns

The candidate's prediction in step 4 should follow one of four prediction patterns, each of which corresponds to a recognizable item structure.

Pattern 1 — Number-extraction prediction

On detail items where the discrimination axis is a number (price, quantity, time, percentage), the candidate predicts that the audio will state two or three numbers and the correct answer is one of them. The candidate's attention during audio playback is dedicated to extracting all numbers in sequence and mapping each to its referent. Trap items frequently include a distractor number that the speaker mentions in passing (a previous price, a rejected proposal, a hypothetical), and the candidate must distinguish the operative number from the distractor.

Pattern 2 — Topic-paraphrase prediction

On gist items, the candidate predicts that the audio will state the topic in a phrase that paraphrases the correct answer choice. The candidate's attention during audio playback is dedicated to recognizing the topic phrase in the audio and mapping it to the correct answer choice. Trap items frequently surface topic-adjacent vocabulary in distractor choices, and the candidate must reject choices that match the audio's vocabulary but not its actual topic.

Pattern 3 — Position-or-attitude prediction

On inference and speaker-attitude items, the candidate predicts that the audio will state the speaker's position or attitude in implicit form — through tone, modal verb choice, hedging, or contrastive structure — and the correct answer choice will paraphrase the implicit position in explicit form. The candidate's attention during audio playback is dedicated to tracking modal verbs ("might", "should have", "would rather"), hedges ("I suppose", "to be fair"), and contrastive markers ("although", "even so"), each of which is a position-disclosure point.

Pattern 4 — Future-action prediction

On future-action items, the candidate predicts that the audio's final third will contain the next step in explicit form (a statement of intent, a request, a commitment), and the correct answer choice will paraphrase this final-third statement. The candidate's attention during the first two-thirds of the audio is reduced, and the final third is tracked with elevated attention.

The preview-window length-discipline rule

A common failure mode at bands 16 through 20 is the candidate who reads the question stem and then reads each answer choice in full, with the result that 12 seconds is not enough to finish and the audio begins before the prediction is formed. The length-discipline rule is the corrective: the candidate reads the question stem in full (roughly 3 seconds), scans the answer choices in roughly 2 seconds without reading them in full, and uses the remaining 6 to 7 seconds for elimination and prediction. The discipline is to scan rather than read the answer choices in the preview phase and to read in detail only after the audio finishes, when the elimination already in place has reduced the cognitive load. See also the listening note-taking strategies guide for the complementary post-audio discipline.

The six-week routine

Weeks 1-2 — Preview-protocol installation drill

The candidate drills the four-step preview protocol on 30 items per week (5 to 6 sessions of 5 to 6 items each) with the audio paused immediately after the preview window so that the candidate can write down the question type, the discrimination axis, the eliminated choice, and the predicted answer before unpausing. The week's output is a preview-protocol compliance log that confirms all four steps are being executed on every item.

Weeks 3-4 — Pattern-recognition drill

The candidate drills 40 items per week with the four prediction patterns assigned: 10 number-extraction items, 10 topic-paraphrase items, 10 position-or-attitude items, 10 future-action items. The audio plays at normal speed and the candidate's predicted answer is logged before the audio finishes. The week's output is a prediction-accuracy log per pattern that identifies the weakest pattern for week 5-6 emphasis.

Weeks 5-6 — Integrated mock-section drill

The candidate completes five integrated listening mock sections per week at the TOEIC Link timed pace, with the preview protocol and pattern-recognition operating in full background. The week's output is a band-equivalent score per mock section and a residual-error log that surfaces any item where the preview protocol failed to produce a prediction.

CEFR band-by-band targets

Band 16: Preview protocol used on roughly one in five items; question-type classification frequently delayed until after audio; prediction formed on roughly one in ten items.
Band 19: Preview protocol used on roughly half of items; question-type classification reliable on the four most common types; prediction formed on roughly one in three items.
Band 22: Preview protocol used on more than 80% of items; question-type classification automatic; prediction formed on more than 60% of items; trap-choice elimination correct on more than 70% of items.
Band 25: Preview protocol used on more than 95% of items; question-type classification automatic and instant; prediction formed on more than 85% of items; trap-choice elimination correct on more than 90% of items.

Closing note

The preview window is structurally guaranteed to be available on every TOEIC Link listening item and the candidate who installs the four-step protocol and the four prediction patterns converts the window from an unused gap into the single largest accuracy multiplier in the listening module. The transfer effect extends to the speaking module's response preparation, where the same prediction discipline shortens the time between prompt and response and improves the response's structural cohesion.