TOEIC Link Listening — Intonation and Emphasis: How Pitch Movement and Stress Patterns Encode the Answer in Parts 3 and 4

TOEIC Link Parts 3 and 4 reward candidates who hear pitch movement and stress placement as meaning, not as decoration. A speaker who lifts the pitch on a contrast word is announcing the answer to the next question; a speaker who flattens pitch on a discourse marker is telling the listener what to skip. This guide maps the four pitch patterns and the three stress functions that account for roughly 35% of inference and gist items in the listening module.

EnglishBlitz Editorial Team·

TOEIC Link Listening — Intonation and Emphasis: How Pitch Movement and Stress Patterns Encode the Answer in Parts 3 and 4

TOEIC Link Parts 3 (short conversations) and 4 (short talks) include a category of items that reward candidates who hear intonation — the pitch movement of the voice — and emphasis — the stress placement on individual words — as carriers of meaning rather than as incidental decoration. The category accounts for roughly 35% of inference and gist questions in the listening module, and the candidates who reliably answer the items are not necessarily candidates with the largest vocabularies. They are candidates who have internalized that a rising pitch on a contrast word, a falling pitch on a list closer, or a stressed syllable on a function word is doing semantic work that the transcript-equivalent of the audio does not capture. Score-band data from internal practice corpora indicates that the gap between candidates who hear the prosodic cues and those who do not is roughly four to six items per listening section, which corresponds to one band on the 0-to-30 score.

This guide describes the four pitch movement patterns that recur in TOEIC Link audio, the three stress functions that the test exploits, and the listening drills that build the cue-recognition habit. For related Part 3 and Part 4 topics, see the guides on listening turn-taking cues, inference and implication questions, and sentence stress and rhythm for listening.

Why intonation and emphasis are testable

A naive view of listening assessment treats the audio as a transcript-equivalent — the listener is asked to recover the words and then to answer questions about the words. The view is incomplete in two respects.

Respect 1 — prosody disambiguates polysemous structures. English sentences with identical surface words can carry different meanings depending on pitch movement. The sentence "She didn't leave because she was tired" can mean either "the reason she did not leave was that she was tired" (single-clause negation) or "she did leave, but the reason was not tiredness" (focus-shifted negation). The two meanings are distinguished by pitch placement on different words. A listener who cannot hear the pitch placement cannot select the correct interpretation, even if the lexical content is fully recovered.

Respect 2 — prosody marks salient content. Speakers use stress and pitch to mark the parts of an utterance that they want the listener to attend to. A speaker who is announcing a price increase will stress the new price and de-stress the old one; a speaker who is contrasting two options will lift the pitch on the option being recommended. The prosodic marking is a structured signal about which content is the answer to the next question. A listener who treats all content as uniformly salient will distribute attention evenly across the audio and miss the speaker's own indication of where the important content is.

The TOEIC Link listening module exploits both respects in its item construction. Questions are constructed around the prosodically marked content, and distractor answers are constructed around the prosodically unmarked content. The pattern is consistent enough that candidates who train on prosodic-cue recognition can predict the question target before the question is asked.

The four pitch movement patterns

Four pitch movement patterns recur in TOEIC Link audio with sufficient frequency to be worth memorizing. Each pattern has a characteristic shape and a characteristic semantic function.

Pattern 1 — the rise-fall on contrast

The rise-fall pattern is a pitch contour that rises on a key word and then falls sharply on the same word or the immediately following word. The pattern marks a contrast — the speaker is signaling that the content being said now differs from content that was said earlier or that the listener might have expected.

The diagnostic feature of the rise-fall is the sharpness of the fall. A casual pitch rise on a key word that does not fall is a different pattern (the level rise, which marks continuation rather than contrast). A rise-fall has a distinct downward movement that creates a sense of finality on the contrasted word.

Question targets that follow rise-fall patterns are typically of the form "what does the speaker mean by X?" or "what does the speaker imply about Y?" The contrasted word is the answer key, and the candidate's job is to identify the contrast partner — the implicit content that the speaker is contrasting the current content with. The contrast partner is typically either prior content in the audio (in Part 3) or background knowledge that the audio assumes (in Part 4).

Pattern 2 — the level rise on continuation

The level rise pattern is a pitch contour that rises on a key word and stays high without falling. The pattern marks continuation — the speaker is signaling that more content is coming and that the current content should be held in working memory pending the additional content.

The diagnostic feature of the level rise is the absence of a fall. The pitch does not return to baseline at the end of the marked word; instead, it remains elevated through the following clause or until the speaker reaches the actual fall that closes the utterance.

Question targets that follow level rise patterns are typically of the form "what will the speaker do next?" or "what is the next step?" The level rise is a forward-pointing cue — it tells the listener that the answer to the question is in the upcoming content, not in the content currently being said. A listener who hears the level rise and prepares attention for the upcoming content captures the answer; a listener who treats the marked word as the answer mis-locates the target.

Pattern 3 — the fall on closure and list closing

The fall pattern is a pitch contour that descends sharply from a high pitch to a low pitch over the final word of an utterance or list. The pattern marks closure — the speaker is signaling that the current topic, list, or argument is complete and that the listener should release working memory from holding open content.

The diagnostic feature of the closing fall is its position at the end of a syntactic unit. The fall on a list closing is distinct from the fall in a rise-fall — the rise-fall pairs a rise immediately followed by a fall on the same word, while the closing fall descends over the final word of a complete unit and is typically preceded by neutral pitch on the immediately prior content.

Question targets that follow closing falls are typically of the form "how many items did the speaker mention?" or "what is the final item in the list?" The closing fall is a list-boundary cue that allows the listener to count the items confidently. A listener who does not hear the closing fall may over-count or under-count the list, both of which produce wrong answers on quantitative items.

Pattern 4 — the high pitch on emphasis

The high pitch pattern is a pitch contour that places a single word at a noticeably higher pitch than the surrounding words, without a sharp rise or fall. The pattern marks emphasis — the speaker is signaling that the marked word is the most important word in the current utterance.

The diagnostic feature of the high pitch is the pitch differential against the surrounding context. A high-pitched word in a low-pitched sentence stands out by 50 to 100 Hertz, which is large enough that even a non-native ear can detect the differential with training.

Question targets that follow high-pitch emphasis are typically of the form "what is the main point of the talk?" or "what is the speaker primarily concerned with?" The emphasized word is the gist anchor, and the candidate's job is to find the answer choice that paraphrases the emphasized content.

The three stress functions

Pitch movement is one dimension of prosody; word stress is the other. Three stress functions recur in TOEIC Link audio with sufficient frequency to be worth memorizing.

Function 1 — lexical stress on minimal pairs

English has many minimal pairs distinguished only by stress placement. The noun CONtract (a binding agreement) and the verb conTRACT (to shrink, or to enter into a contract) share an identical spelling but are distinguished by which syllable carries the primary stress. Similar pairs include OBject vs. obJECT, REcord vs. reCORD, PROject vs. proJECT, and SUSpect vs. susPECT.

TOEIC Link uses minimal pairs as distractor traps. A question may ask about a "project deadline" while the audio refers to a verb-form "proJECT" of demand — and the distractor answer treats the verb as the noun. The remediation is to internalize the noun-verb stress alternation as an automatic perception, not a deliberate parse.

Function 2 — phrasal stress on compound nouns

English compound nouns place primary stress on the first element: WHITEhouse (the official residence) vs. white HOUSE (any house that is white). Similar compounds include DARKroom, GREENhouse, BLUEbird, and the business-vocabulary compounds SALESforce, MARKETplace, and STAFFroom.

The phrasal stress carries meaning — the compound denotes a specific institutional or category-defined referent, while the adjective-noun phrase denotes any instance with the relevant attribute. TOEIC Link exploits the distinction in business-context items where the candidate must select the institutional referent rather than the literal one. A listener who hears phrasal stress correctly recognizes that the speaker is referring to a specific named entity rather than to a generic one.

Function 3 — focus stress on function words

Function words — articles, prepositions, auxiliaries, and pronouns — are typically unstressed in connected speech. When a speaker places stress on a function word, the stress is doing focus work — the speaker is making a contrast or correction at the function-word level.

The sentence "I saw HER" (with stress on the pronoun) contrasts the seen person with someone else who was expected to be seen. The sentence "I DID send the email" (with stress on the auxiliary) contradicts a prior claim that the email was not sent. Focus stress on function words is one of the most reliable inference cues in the listening module because the stressing is so unusual against the baseline that the marked word draws immediate attention.

Drills that build prosodic-cue recognition

Three drills, run for 15 to 20 minutes per day, build the cue-recognition habit over six to eight weeks.

Drill 1 — pitch-tracking with the eyes closed. Play a Part 3 conversation with the eyes closed and the transcript hidden. Listen for the pitch contour rather than the lexical content. After each turn, sketch the pitch contour on paper using a simple up-arrow or down-arrow notation for the marked words. The drill builds the perceptual habit of tracking pitch independently of meaning, which is the prerequisite for using pitch as a semantic cue.

Drill 2 — emphasis prediction. Read a Part 4 transcript silently first and predict which word in each sentence the speaker will emphasize. Then play the audio and check the prediction. The drill builds the cognitive habit of anticipating prosodic marking based on the discourse function of the words, which is the prerequisite for using prosody to predict question targets.

Drill 3 — minimal-pair discrimination. Use a curated list of noun-verb minimal pairs (CONtract / conTRACT, REcord / reCORD, etc.) and have a partner or audio recording produce one of the two pronunciations at random. The drill builds the perceptual habit of distinguishing stress placements that look identical in print, which is the prerequisite for accurate transcription-level comprehension.

The drills are most effective when combined with a reading paraphrase recognition practice, because the paraphrase task forces the candidate to map the prosodically marked content to alternative wordings that the answer choices may use. For broader pacing and time budgeting across the listening module, see pacing and time management.

Common pitfalls

Three pitfalls recur in candidate practice and are worth naming.

Pitfall 1 — equating volume with stress. Stress is primarily a function of pitch and duration, not of volume. A loudly spoken word is not necessarily a stressed word. The remediation is to listen for pitch and timing rather than for loudness, which often requires consciously suppressing the volume-as-stress heuristic.

Pitfall 2 — over-attending to the first stressed word. Candidates who learn to listen for stress often fixate on the first stressed word in an utterance and miss the second or third, which may be the actual answer key. The remediation is to maintain attention through the entire utterance and to update the working hypothesis about the answer key as new stressed content arrives.

Pitfall 3 — under-attending to native-speaker reductions. The function-word reductions that mark unstressed content ("gonna" for "going to," "wanna" for "want to," "lemme" for "let me") are diagnostic of the unstressed status of the reduced function words. A candidate who treats the reductions as lexical content has missed the prosodic signal that the surrounding content is the actual focus.

A worked example

Consider a Part 4 short talk that opens with the sentence "Our quarterly results — and I want to emphasize this — exceeded the BOARD's expectations, not the analysts'." The sentence has four prosodic cues:

  • The level rise on "results" marks continuation and tells the listener to hold the topic open.
  • The high pitch on "emphasize" marks emphasis and signals that the speaker is announcing the main point.
  • The focus stress on "BOARD" creates a contrast that the rise-fall completes on "analysts'."
  • The closing fall on "analysts'" marks the end of the contrastive unit and indicates that the main point is complete.

A question that follows this sentence is likely to ask "whose expectations did the results exceed?" — and the correct answer is "the board." A candidate who heard the focus stress on "BOARD" can answer the question before the question is even asked; a candidate who did not hear the stress must reconstruct the answer from the lexical content and may be misled by the proximity of "analysts'" at the end of the sentence.

The example illustrates the broader pattern. Prosodic cues are not a supplement to lexical comprehension — they are an independent channel of meaning that the TOEIC Link listening module systematically tests. Candidates who train on the four pitch patterns and the three stress functions, and who run the daily drills until the perception is automatic, capture the four-to-six-item gap that prosody-aware listening produces.

Next steps

The intonation and emphasis training compounds with three adjacent skills. Build shadowing as a daily 20-minute habit to anchor the prosodic patterns in production as well as perception; layer sentence stress and rhythm drills to build the rhythmic baseline that prosodic deviations stand out against; and integrate the prosodic-cue habit into the listening-strategies-by-question-type framework so that the cue recognition is deployed in service of the specific question targets that the test asks.