TOEIC Link Listening — Deictic Expression and Spatial Reference Resolution

Deictic expressions on the TOEIC Link Listening module — here, there, this, that, now, then — anchor meaning to the speaker context that the candidate cannot see. Covers the four deictic categories the test exploits, the speaker-context reconstruction protocol that resolves reference in real time, and a four-week training sequence that converts deictic recognition into reflexive comprehension.

EnglishBlitz Editorial Team·

TOEIC Link Listening — Deictic Expression and Spatial Reference Resolution

The TOEIC Link Listening module routinely uses deictic expressions — here, there, this, that, now, then, these people, that side — to anchor meaning to a speaker context that the candidate cannot see. Where a reading passage can name the referent explicitly, the listening passage relies on the deictic to do the referential work, and the candidate has to reconstruct the speaker context to resolve the deictic correctly. Above the 80-percent accuracy band, deictic mis-resolution is a measurable error source because the comprehension gaps that lower-band candidates lose to are no longer dominant, and the remaining error pool concentrates around pragmatic operations like deictic resolution that depend on inferred speaker context.

This article covers the four deictic categories that the test exploits, the speaker-context reconstruction protocol that resolves deictic reference in real time, the three failure modes that turn a tractable deictic into a wrong answer, and a four-week training sequence that converts deictic recognition into reflexive comprehension during the listening loop.

Why deictic resolution is a measurable error source above the 80-percent band

Candidates below the 80-percent band lose listening items primarily to phonetic and lexical comprehension limits — they have not yet decoded all the words, or they have decoded the words but not yet recovered the proposition. Above the 80-percent band, the lexical layer is solid and the error pool migrates toward operations that the candidate has to perform on top of comprehension. Deictic resolution is one of those operations, and it is the most common source of higher-band errors on the question types that depend on the candidate inferring the speaker context.

The test concentrates deictic items in three question forms. The first is the spatial-reference question — Where is the speaker likely to be located — that requires the candidate to resolve a here or there to a specific physical or organisational location. The second is the temporal-reference question — When did the event the speaker is describing occur — that requires the candidate to resolve a now or then relative to the moment of speaking. The third is the participant-reference question — Who is the speaker addressing when they use the phrase you and your team — that requires the candidate to resolve a person-deictic to a specific participant or group in the speaker context.

For related coverage of how pragmatic operations interact with discourse decoding at the listening level, see pragmatic implicature and conventional inference recognition and speaker role and relational decoding.

The four deictic categories the test exploits

The four deictic categories that account for nearly all deictic mis-resolution on the test are distinguishable by what they anchor — physical space, time, participants, or discourse. Recognizing the category in real time is what makes the resolution protocol efficient, because each category calls for a slightly different reconstruction sequence.

Category 1 — Spatial deictic

Spatial deictics — here, there, this room, that floor, over by the entrance — anchor meaning to a physical or organisational location relative to the speaker. The deictic does not name the location; it expects the listener to recover the location from the surrounding speaker context. Cues that help recover the spatial context include the speaker's role description, the named events or activities happening at the moment of speaking, and the named objects or people the speaker references alongside the deictic.

The risk is that the candidate hears the spatial deictic and tries to map it to a location named earlier in the passage, when in fact the deictic anchors to the implicit speaker location that has to be inferred. The correction is to maintain a working hypothesis about the speaker location from the first few seconds of the passage and to update the hypothesis only when explicit cues shift the location.

Category 2 — Temporal deictic

Temporal deictics — now, then, at the moment, at the time, these days, back then — anchor meaning to a point in time relative to the moment of speaking. The deictic does not name the absolute time; it expects the listener to recover the time from the surrounding temporal cues. Cues that help recover the temporal context include named dates, named events, tense shifts, and aspect shifts that mark the temporal frame.

The risk is that the candidate hears the temporal deictic and adopts the moment-of-speaking as the reference frame without checking whether the speaker is referencing a different temporal frame within the same utterance. The correction is to track the temporal frame on a scratch line and to update the frame whenever the speaker shifts tense or introduces a temporal anchor.

Category 3 — Participant deictic

Participant deictics — you, we, they, your team, our side, their people — anchor meaning to a person or group relative to the speaker's discourse position. The deictic does not name the participant; it expects the listener to recover the participant from the surrounding discourse and from the speaker's relational stance. Cues that help recover the participant context include the speaker's named role, the addressee's named role, and the relational verbs that mark the speaker's stance toward the addressee or third party.

The risk is that the candidate hears we and treats it as referring to the speaker plus the addressee, when in fact we refers to the speaker's organisation excluding the addressee. The correction is to flag every we and you as a hypothesis-trigger and to test the hypothesis against the surrounding relational verbs and role cues.

Category 4 — Discourse deictic

Discourse deictics — this, that, the latter, the former, the point I was making — anchor meaning to a proposition or stretch of discourse that has been mentioned earlier or is about to be mentioned. The deictic does not name the proposition; it expects the listener to retrieve the proposition from memory of the recent discourse. Cues that help recover the discourse referent include the discourse marker that introduced the proposition and the proposition's position in the rhetorical structure.

The risk is that the candidate loses track of the discourse antecedent across a few seconds of intervening speech and resolves that to a recent noun phrase rather than to the embedded proposition the speaker actually intended. The correction is to maintain a two-slot proposition buffer that holds the most recent two propositions and to consult the buffer when a discourse deictic is heard.

The speaker-context reconstruction protocol

The speaker-context reconstruction protocol resolves deictic reference in four steps that can be executed during the first pass of the passage. The protocol assumes that the candidate has already recognized the deictic category in real time; the four steps run as the deictic is heard.

Step 1 — Identify the deictic and tag its category

The first step is to recognize that a deictic has been uttered and to tag it with one of the four categories — spatial, temporal, participant, or discourse. The tagging is automatic for the trained candidate and takes under a second per deictic. Without explicit tagging, the candidate is likely to default to surface-form resolution and miss the deictic operation entirely.

Step 2 — Retrieve the relevant context slot

The second step is to retrieve the relevant context slot — the spatial hypothesis, the temporal frame, the participant map, or the discourse proposition buffer — depending on the deictic category. Each context slot is maintained as a working hypothesis throughout the passage and updated when explicit cues shift the slot value.

Step 3 — Apply the deictic to the slot value

The third step is to apply the deictic to the current slot value to produce the referent. For spatial here, the referent is the current spatial hypothesis. For temporal now, the referent is the current moment of speaking. For participant we, the referent is the current speaker group hypothesis. For discourse that, the referent is the most recent proposition in the buffer.

Step 4 — Confirm against subsequent context

The fourth step is to confirm the resolution against the subsequent speech, which typically contains a confirming cue within a few seconds of the deictic. If the confirming cue contradicts the resolution, return to step 2 and retrieve a different context slot value. The confirmation step catches the cases where the working hypothesis was wrong, and it is what prevents anchor-drift errors from propagating through the rest of the passage.

The three failure modes to avoid

The first failure mode is surface-form mapping — treating the deictic as referring to the most recent surface noun phrase. The failure mode is the dominant source of error on participant and discourse deictics. The correction is the explicit category-tagging and slot-retrieval sequence.

The second failure mode is context-slot freeze — failing to update the working hypothesis when an explicit cue shifts the slot value. The failure mode is the dominant source of error on long passages where the speaker shifts location, time, or participant group mid-utterance. The correction is to register explicit shift cues — at the new office, back in 2018, from the customer's perspective — as slot-update triggers.

The third failure mode is confirmation skipping — skipping the confirmation step on apparent resolutions, which leaves anchor-drift errors uncorrected. The correction is to budget two to three seconds after each deictic for confirmation against the subsequent speech.

For coverage of how speaker-context tracking interacts with longer-form decoding and multi-speaker discrimination, see multi-speaker discrimination and tracking and meeting and conference call decoding.

Four-week training sequence

Week one establishes deictic-category recognition. Drill 30 minutes per day on short audio clips that contain a high density of deictic expressions, tagging each deictic as spatial, temporal, participant, or discourse within a second of hearing it. The target is 90-percent tagging accuracy by the end of the week on a held-out set of 200 deictic instances.

Week two extends tagging to full passages and adds context-slot maintenance. Listen to ten three-minute passages per day and maintain a scratch context map for each — spatial hypothesis, temporal frame, participant map, proposition buffer — while answering deictic-resolution questions after the passage. The target is 85-percent resolution accuracy with the residual concentrated on context-slot-freeze errors.

Week three integrates the four-step protocol with explicit question-type practice. Practice 15 deictic-resolution questions per day from the three question forms, scoring each answer with notes on which step of the protocol led to the selection. The target is for the protocol to be the source of the answer on at least 80 percent of deictic questions, with the residual concentrated on questions where the deictic was ambiguous in the audio itself.

Week four runs the protocol at section pace under timed conditions. Complete one full listening section per day with deliberate attention to deictic items, and review only the deictic-related items in the post-section debrief. The target is for deictic-resolution accuracy to match the candidate's overall section accuracy, indicating that deictic items are no longer a differential error source.