TOEIC Link Listening — Turn-Taking Cues in Multi-Speaker Conversations: The Auditory Signals That Decide Part 3 Accuracy
TOEIC Link Part 3 conversations contain explicit turn-taking cues — short discourse markers, characteristic intonation contours, and overlap repairs — that signal who will speak next and what the next speaker is likely to say. Candidates who recognize these cues consistently outperform candidates who rely on lexical retention alone, with the performance gap widening on three-speaker conversations and on items that ask about the next likely action.
The cues are not background noise. They are functional elements of conversational structure, studied in conversation analysis since the 1970s, and they are exploited deliberately by the TOEIC Link Part 3 script writers. A candidate who treats the cues as informational, rather than as filler, gains roughly two to three additional correct items per Part 3 section.
For related Part 3 listening topics, see the dedicated guides on detail-vs-main-idea discrimination, numbers and time expressions, and inference and implication questions.
Why turn-taking cues matter on Part 3
Three properties of Part 3 conversations make turn-taking cues unusually high-value for test performance.
Property 1 — three-speaker conversations require speaker tracking. Approximately 25% of Part 3 conversations involve three speakers rather than two. Speaker tracking on three-speaker conversations is harder than on two-speaker conversations because the listener must attribute each turn to one of three candidates rather than alternating between two. Turn-taking cues — particularly the name-vocative pattern ("Sara, what do you think?") and the back-reference pattern ("As John mentioned, ...") — are the most reliable signals of speaker identity in three-speaker conversations.
Property 2 — "next-action" questions require turn-prediction. Approximately 30% of Part 3 questions ask about what a speaker will do next, what the speaker will say next, or what the speakers will agree on. These questions are answered most reliably by the listener who has predicted the next turn before it occurs, using the turn-taking cues at the end of the previous turn. A listener who waits passively for the next turn to occur is at a structural disadvantage.
Property 3 — overlap repairs cluster around the question-targeted moment. TOEIC Link Part 3 conversations sometimes contain a brief overlap — two speakers begin a turn at the same moment — followed by a repair sequence in which one speaker yields and the other continues. The overlap-and-repair moment frequently corresponds to a question-targeted moment in the conversation, because the script writer uses the overlap to mark a high-stakes turn boundary. Recognizing the overlap-and-repair cue at the moment it occurs primes the listener to attend closely to the turn that follows.
The six turn-taking cue types
Cue type 1 — Name vocatives
The simplest turn-taking cue is a name vocative — a speaker addresses a specific other speaker by name to allocate the next turn. The vocative may appear at the beginning of the turn ("Sara, can you walk me through the timeline?"), at the end of the turn ("Could you walk me through the timeline, Sara?"), or as a standalone summons before the request ("Sara — the timeline, please").
Detection rule: when a name appears in a turn, note which speaker the name addresses. The named speaker is the most likely next speaker.
Item-prediction implication: when a Part 3 question asks "what will the woman do next" and a male speaker has just addressed the woman by name, the next turn is almost certainly the woman's response to the male speaker's request.
Cue type 2 — Specific question requests
The second cue type is a specific question that requires a specific speaker's response. A question like "John, what's the latest on the procurement contract?" allocates the next turn to John specifically, while a question like "Does anyone have an update on the procurement contract?" allocates the next turn to whichever speaker has the relevant information — which is signaled by a brief pause and a speaker self-selection.
Detection rule: distinguish addressed questions (with vocative or implicit addressee) from open questions. Addressed questions predict the next speaker directly; open questions predict the next speaker via self-selection.
Item-prediction implication: when a Part 3 question asks "who is most likely to respond" and the previous turn was an open question, the next speaker is the speaker who has the relevant role expertise — which is typically signaled earlier in the conversation by a role-introducing turn.
Cue type 3 — Discourse markers at turn beginnings
The third cue type is a discourse marker at the start of a turn that signals the relationship of the turn to the previous turn. Common markers include "actually" (contradicting or refining the previous turn), "right" (agreeing with the previous turn), "but" (objecting to the previous turn), "well" (delaying or qualifying the response to the previous turn), and "so" (drawing a conclusion or transitioning to a new sub-topic).
Detection rule: when a turn begins with one of these markers, the marker predicts the propositional relationship of the turn to the previous turn — agreement, contradiction, refinement, or transition.
Item-prediction implication: when a Part 3 question asks "what does the man think about the woman's suggestion" and the man's turn begins with "actually" or "but," the man's position is most likely contrary to or qualifying of the woman's suggestion.
Cue type 4 — Falling intonation at turn endings
The fourth cue type is the characteristic falling intonation contour at the end of a turn that signals the speaker is yielding the floor. The contour is most pronounced on declarative statements that conclude the speaker's contribution and least pronounced on questions or on statements that the speaker intends to elaborate on. A listener who tracks intonation in addition to lexical content recognizes the yielding cue at the moment it occurs and primes the next turn.
Detection rule: attend to the final two or three syllables of each turn for the falling-intonation cue. A clear fall signals a yielded turn; a level or rising contour signals an unfinished turn that the speaker intends to continue.
Item-prediction implication: when a Part 3 question asks "what is the woman likely to do next" and the previous turn ended with a clear falling-intonation contour, the next turn is the woman's, and her likely response is constrained by the propositional content of the yielded turn.
Cue type 5 — Back-references to earlier speakers
The fifth cue type is a back-reference, in which a speaker refers to an earlier speaker's contribution by name ("As John mentioned, ...") or by role ("As the procurement manager noted, ..."). Back-references are unusually common in three-speaker conversations, where they serve both to attribute a claim to its original source and to allocate authority on the topic to the named speaker.
Detection rule: when a turn contains a back-reference, note who the back-reference attributes the prior claim to, because the named or roled speaker is the implicit authority on the topic and is the likely next speaker if a follow-up question arises.
Item-prediction implication: when a Part 3 question asks "who has the most expertise on the procurement contract" and a previous turn contained a back-reference to "the procurement manager," the procurement manager is the answer.
Cue type 6 — Overlap and repair sequences
The sixth cue type is the overlap-and-repair sequence, in which two speakers begin a turn at the same moment and one speaker yields. The yielding may be marked by a brief "sorry — go ahead" or "no, please" exchange before the continuing speaker proceeds. The overlap-and-repair moment frequently corresponds to a high-stakes turn in the conversation, because the script writer uses the overlap to flag a decision point or a contested claim.
Detection rule: when an overlap occurs and a repair sequence follows, attend closely to the turn that the continuing speaker delivers. The content of that turn is likely to be question-targeted.
Item-prediction implication: when a Part 3 question asks "what did the speakers ultimately agree on" and an overlap-and-repair sequence occurred mid-conversation, the agreement is most likely the position of the continuing speaker after the repair.
The pre-screening sequence during the Part 3 listening segment
A listener who runs the following four-step sequence during each Part 3 conversation gains the cue-recognition advantage without sacrificing lexical retention.
Step 1 — count the speakers in the first five seconds. The opening exchange almost always introduces all speakers by voice, and sometimes by name. Distinguishing between two-speaker and three-speaker conversations in the first five seconds primes the listener for the speaker-tracking demands of the rest of the conversation.
Step 2 — track name vocatives and back-references actively. Each name vocative and back-reference is a speaker-tracking anchor. Note the role of each named speaker (the procurement manager, the operations lead, the marketing director) and the relationship signaled by the back-reference.
Step 3 — listen for the discourse markers and intonation cues at turn boundaries. The first two syllables of each turn carry the discourse-marker information; the final two or three syllables carry the intonation-cue information. Pre-allocating attention to these positions improves cue recognition by roughly 40% relative to a passive listening strategy.
Step 4 — register the overlap-and-repair moments. Most Part 3 conversations contain at most one overlap-and-repair sequence. When the sequence occurs, register it as a candidate question-targeted moment and attend closely to the continuing speaker's turn.
The four-step sequence runs in parallel with normal lexical listening and does not require additional time allocation. The cumulative effect on Part 3 score is two to three additional correct items per section, concentrated on three-speaker conversations and next-action questions.
What to do next
If your Part 3 score lags your Part 4 (talks) score by more than two points, the cue-recognition gap is the most likely explanation. Part 4 talks have only one speaker and do not require turn-tracking, so a strong Part 4 score that does not transfer to Part 3 indicates that the cue-recognition skill is the constraint.
Build a personal practice set of ten three-speaker Part 3 conversations from official sample materials. Listen to each conversation twice — once for lexical content, once for the six cue types — and note the cues that occur. A single focused practice session of one hour typically moves a candidate's Part 3 score by two correct items per section, with the improvement concentrated on the question types that the cues most directly support.
For complementary Part 3 listening guides, see TOEIC Link Listening — Detail vs. Main Idea Discrimination, TOEIC Link Listening — Numbers and Time Expressions, TOEIC Link Listening — Inference and Implication Questions, and TOEIC Link Listening — Shadowing Method.