Extended Discourse and Multi-Turn Coherence Control in TOEIC Link Speaking

Candidates who reach the low-20s band on TOEIC Link Speaking and then plateau there for months are almost always solving the wrong problem. They drill pronunciation, expand vocabulary, and rehearse opinion templates — and their single-sentence accuracy keeps improving — but their band score does not move. The reason is that the rubric for the extended-response and opinion-response tasks does not reward single-sentence accuracy past a certain ceiling. It rewards multi-turn coherence, which is the ability to sustain a structured line of reasoning across the full 45-second response without the listener losing track of what is being argued and why.

This article walks through what the rubric is measuring under the label "discourse coherence," the three failure modes that cap intermediate candidates in the low 20s, the structural moves that high-band candidates use to stay coherent under time pressure, and a four-week drill protocol that builds the skill into reflex rather than effort.

What multi-turn coherence actually means on the rubric

The TOEIC Link Speaking rubric scores responses on a four-axis grid — pronunciation, grammar, vocabulary, and discourse coherence. The first three axes have visible ceilings: a candidate with clean phoneme production, solid sentence-level grammar, and a working B2-level lexicon will score at the top of those three axes consistently. Discourse coherence does not have a visible ceiling at that level. Two candidates with identical pronunciation, grammar, and vocabulary scores can land two bands apart on overall score because one of them controls discourse and the other does not.

Discourse coherence is measured on three sub-axes that the rubric does not name explicitly but that every certified rater applies.

The first sub-axis is topical continuity, which is the degree to which each clause in the response advances the same line of reasoning rather than restarting or drifting. A response that begins with a position, gives a reason, gives an example, and closes with a restatement scores high. A response that begins with a position, gives a reason, then drifts into an unrelated observation, then returns to a restated position scores low even if every individual sentence is grammatical.

The second sub-axis is logical signposting, which is the degree to which the relationships between clauses are made visible to the listener through transition language. A candidate who says "Remote work increases productivity. People avoid commute time. Companies save office costs" gives three separate observations. A candidate who says "Remote work increases productivity for two reasons. First, employees avoid commute time, which adds two productive hours to the day. Second, companies save on office costs, which they can reinvest in tools that further boost output" gives the same content but signals the relationships, and scores roughly one band higher on discourse for the same lexical inventory.

The third sub-axis is reference tracking, which is the degree to which pronouns, definite articles, and demonstratives point unambiguously to their referents across the response. A candidate who introduces an example with "I once worked at a company that..." and then refers to "they" three sentences later, while the rater is uncertain whether "they" means the company or the company's competitors, has broken reference tracking even if the grammar is impeccable.

For related coverage of how transition language operates at the clause level, see the guide on discourse markers and cohesion. For the writing-side equivalent of these same principles, see the guide on coherence and cohesion devices.

The three failure modes that cap intermediate candidates

Internal practice-corpus data on candidates who plateau in the 20-to-22 band shows three recurring discourse failure modes. Each of them is fixable in roughly two to three weeks of targeted practice, but they are rarely diagnosed because candidates and most teachers focus on the visible ceilings — pronunciation, grammar, vocabulary — rather than the hidden one.

Failure mode 1 — The reset loop

The reset loop happens when a candidate gives a position, develops it for ten or twelve seconds, hits a hesitation point, and restarts the response with a slightly rephrased version of the original position. The reset is visible to the rater as a discourse break, because the response sounds like two short answers stitched together rather than one extended answer. A response with two visible resets in a 45-second window will score in the 18-to-20 band even if the content is otherwise strong.

The reset loop is usually a symptom of the candidate running out of elaboration material at the 12-second mark and not having a structural move to extend. The fix is to drill the three elaboration moves — zoom-in to specific instance, contrast with alternative, generalize to broader principle — until at least one of them is available reflexively at every hesitation point. See the guide on impromptu elaboration and on-the-spot development for the detailed drill set.

Failure mode 2 — The signpost-free chain

The signpost-free chain happens when a candidate strings together five or six grammatically clean sentences without any transition language. The response sounds like a list rather than an argument. To the rater, the response demonstrates that the candidate can produce sentences but does not demonstrate that the candidate can construct a paragraph. The rubric reads the absence of signposts as evidence that the candidate is operating below B2 on the discourse axis even if the candidate's lexicon is C1.

The fix is to install a small set of signposts that the candidate uses without thinking — typically a one-word opener ("First," / "Second," / "Finally,"), a one-clause causal connector ("which means," / "because of this,"), and a one-clause contrastive connector ("although," / "even so,"). Three signposts deployed across a 45-second response are sufficient to move the discourse score from below-B2 to solid B2.

Failure mode 3 — The drifting referent

The drifting referent happens when a candidate uses a pronoun or demonstrative whose antecedent is unclear because the response has introduced multiple potential antecedents in quick succession. The rater spends a fraction of a second resolving the ambiguity and that fraction of a second is registered as a coherence break. A response with three or more drifting referents in 45 seconds caps at the 20-to-22 band.

The fix is to drill an explicit-naming protocol where the candidate uses the noun phrase rather than the pronoun whenever there are two or more potential antecedents in the prior three clauses. The protocol feels redundant at first and produces sentences like "When companies adopt remote work, the companies typically see a productivity increase in the first quarter" rather than "When companies adopt remote work, they typically see a productivity increase in the first quarter." The redundancy disappears from the response within a week of drilling because the candidate begins to vary the noun phrase ("the firms," "those organizations") rather than reverting to pronouns.

The structural moves high-band candidates use

Candidates who score above the 26 band on TOEIC Link Speaking use a small set of structural moves that make multi-turn coherence reflexive. The moves are not memorized templates — templates fail under the time pressure of impromptu speaking — but small structural commitments that the candidate makes at the start of the response and then fulfills.

The most common move is the two-reason commitment, where the candidate states their position and immediately announces that there are two reasons supporting it. The commitment is consequential because it converts the rest of the response into a search for two specific items rather than an open-ended attempt to fill 45 seconds. The candidate's cognitive load drops, the response gains automatic structure, and the rater hears a coherent argument.

A second common move is the concession-then-recovery, where the candidate acknowledges a counter-argument and then explains why their position holds despite it. The move is structurally powerful because it demonstrates the candidate's ability to handle alternative perspectives, which is one of the highest-leverage signals on the speaking rubric. The concession does not need to be substantive — a single clause acknowledging the counter-position is sufficient. The recovery does most of the work. See the guide on argumentative balance and concession management for the detailed concession protocols.

A third common move is the specific-then-general, where the candidate opens with a concrete instance and then extracts the general principle from it. The move is the inverse of the more common general-then-specific structure and it scores higher because it demonstrates inductive reasoning, which is rarer in the candidate pool than deductive reasoning. The candidate begins with "Last quarter at my company, we replaced three legacy systems with a single platform, which cut our maintenance overhead by about 40 percent" and then extracts "This is why I think technology consolidation is the highest-leverage cost-saving move for mid-sized firms — the maintenance overhead of fragmented systems compounds in ways that are invisible until you remove it."

The four-week drill protocol

The four-week drill protocol builds multi-turn coherence into reflex through structured daily practice. The protocol assumes 25 to 35 minutes per day of focused practice. Candidates who can sustain the daily commitment for four weeks typically see a one-to-two band shift on discourse coherence and a corresponding shift in overall band.

Week 1 — Signpost installation

Week 1 focuses exclusively on installing the three core signposts. The candidate selects three prompts per day, records 45-second responses, and counts the number of transition phrases used. The week-one target is at least three signposts per response. The candidate does not try to improve content or vocabulary during week one — the only metric is signpost count.

By the end of week 1, the candidate should be producing responses with three signposts without thinking about it. If the signposts feel forced or scripted, the drill needs another five to seven days of practice.

Week 2 — Two-reason commitment

Week 2 adds the two-reason commitment to the practice set. The candidate continues the daily three-prompt recording but now opens every response with an explicit two-reason structure. The week-two target is a response that names two reasons in the first ten seconds and fully develops both reasons by the 35-second mark.

The most common week-2 failure mode is the candidate committing to two reasons but only developing one. The fix is to set an audible timer at 25 seconds — when the timer rings, the candidate transitions to the second reason regardless of where they are in the first reason. The forced transition produces awkward responses at first but builds the time-budget discipline within five to seven days. See the guide on time budget allocation and response pacing for the related pacing drills.

Week 3 — Reference tracking

Week 3 introduces the explicit-naming protocol for reference tracking. The candidate continues the daily three-prompt recording and reviews each recording for ambiguous pronouns. Every ambiguous pronoun is replaced with an explicit noun phrase, and the candidate re-records the response with the corrections in place.

The drill is initially tedious because most candidates have three to five ambiguous pronouns per response. By the end of week 3, the count typically drops to zero or one per response, and the candidate has internalized the protocol enough that the explicit-naming happens during production rather than during review.

Week 4 — Integration under realistic conditions

Week 4 integrates the three skills into a single response under realistic test conditions. The candidate uses unfamiliar prompts, records responses without preparation time, and reviews each response on all three sub-axes — topical continuity, logical signposting, reference tracking. The week-four target is a response that scores at least 3 out of 3 on signposting, 3 out of 3 on continuity, and at most 1 ambiguous referent.

Candidates who reach the week-4 target consistently across five consecutive practice sessions are typically ready to demonstrate the new band on the next live administration. Candidates who are inconsistent at week 4 should not retake the test yet — the protocol benefits from another two-week reinforcement cycle.

Why this matters more than vocabulary expansion

Most TOEIC Link Speaking candidates plateau in the low-20s band and respond by drilling more vocabulary, more idioms, and more topic-specific phrasing. The investment rarely shifts the band score because vocabulary is not the binding constraint at the 20-band level. Discourse coherence is.

The asymmetry is large. A candidate who adds 500 vocabulary items to their active range over four weeks will typically see a 0-to-1 band shift. A candidate who installs the three signpost moves, the two-reason commitment, and the explicit-naming protocol over the same four weeks will typically see a 1-to-2 band shift. The same time investment yields roughly double the band movement when it targets discourse rather than lexicon.

The implication for study planning is direct. Candidates in the 20-to-22 band who want to cross into the 24-to-26 band should invest the next four weeks in discourse drills rather than vocabulary expansion. Vocabulary expansion is the right investment for the 26-to-28 band-to-band move, where lexicon range becomes the binding constraint again. The sequencing matters as much as the choice of drill. For related guidance on cross-skill priority sequencing, see the guide on speaking and writing tips.