TOEIC Link Confidence Interval and Score Band: How to Read Your Score Like an Assessment Specialist

When ETS publishes a TOEIC Link score, it publishes a single number on a 0-25 scale. What it does not publish next to that number — but what every assessment specialist treats as inseparable from it — is the confidence interval, sometimes called the score band. The confidence interval is the range within which your true ability is statistically expected to fall, given the test you took.

For test-takers, ignoring the confidence interval can lead to wasted re-test fees and misplaced confidence. For HR teams, ignoring it can lead to hiring decisions that are not statistically defensible. This article explains what the confidence interval actually means, how to derive a usable score band from any TOEIC Link result, and the operational rules for treating scores as bands rather than as point estimates.

What a confidence interval is — and what it is not

A confidence interval (CI) is a statistical statement about the precision of a measurement. It is not a statement about probability of any individual score being "correct." When ETS reports a TOEIC Link score of 18 with a 95% CI of ±2, the correct interpretation is:

If we re-tested this person many times under similar conditions, 95% of those re-tests would produce scores in the range 16-20.

Three things this does not mean:

It does not mean there is a 95% chance the person's "true" score is between 16 and 20. The true score is a fixed value; the CI is about the measurement's reliability, not the person's variability.
It does not mean the person can score anywhere from 16 to 20 on any given test. On any given administration, they will score very close to their true ability. The CI captures measurement noise across hypothetical repeated tests.
It does not mean a 16 and a 20 are "the same score." They are statistically indistinguishable at the 95% level if both belong to the same person, but they are not the same score in any operational sense — see the HR section below.

Understanding this distinction is the foundation. Most misuse of TOEIC Link scores comes from collapsing the difference between the point estimate and the band.

How TOEIC Link's confidence interval is constructed

The TOEIC Link uses an adaptive testing model (Item Response Theory, IRT, with a 2-parameter or 3-parameter logistic depending on the section). For background on adaptive testing in TOEIC Link, see our TOEIC Link adaptive testing explained guide.

Under IRT, the standard error of measurement (SEM) is not constant across the score scale. It is smallest near the center of the ability distribution (around scores 12-18) and larger at the tails (below 8 or above 22). This is because adaptive tests gather more discriminating information about ability levels near the test's targeted difficulty range.

Practical consequence: the same nominal score difference means different things at different points on the scale.

A 16 vs an 18 difference, both in the dense center of the scale, has a relatively narrow combined uncertainty.
A 22 vs a 24 difference, both at the upper tail, has a wider combined uncertainty even though the nominal gap is the same 2 points.

ETS does not publish item-level SEM tables for TOEIC Link, but the published score-report bands (which appear on official score reports as a "score range" rather than a single number) implicitly encode this scale-dependent uncertainty.

The operational score-band rule

If you do not have access to ETS's exact SEM tables, use this rule of thumb derived from the published band widths on TOEIC Link score reports:

Scores 6-12: ±2 points (95% CI)
Scores 13-19: ±1.5 points (95% CI)
Scores 20-23: ±2 points (95% CI)
Scores 24-25: ±1.5 points (95% CI, but ceiling effects compress the upper bound)

These are approximations. The precise CI on any individual score depends on which items the adaptive engine selected, but for operational decisions (HR screening, self-evaluation, retake choice) these bands are accurate enough.

For deeper context on how to read the official score report, see our TOEIC Link score report interpretation guide.

When a score difference matters and when it does not

Two scores are statistically distinguishable if their 95% confidence intervals do not overlap. Two scores are operationally distinguishable if the difference is large enough to support a decision.

These are different thresholds, and confusing them is the most common mistake in score-based decision-making.

Statistical distinguishability examples

17 vs 20: CIs roughly 15.5-18.5 vs 18.5-21.5. Marginal overlap. Statistically borderline.
17 vs 22: CIs roughly 15.5-18.5 vs 20-24. Clear non-overlap. Statistically distinguishable.
19 vs 20: CIs roughly 17.5-20.5 vs 18.5-21.5. Substantial overlap. Not distinguishable.

Operational distinguishability — the 3-point rule

For operational use (HR screening, promotion eligibility, scholarship cutoffs), the assessment-industry standard is to treat differences of 3 or more points on the TOEIC Link 25-point scale as meaningful, and differences of 2 or fewer as inside the noise.

This is more conservative than the pure statistical rule because operational decisions need a buffer for:

Test-day variance (sleep, stress, technical issues during the at-home test)
Item-pool variance (different test forms have slightly different difficulty profiles)
Effort variance (especially on Speaking/Writing where production effort fluctuates)

For at-home test-specific variance sources, see our TOEIC Link technical difficulties during test guide.

How HR teams should set screening thresholds

The single most common HR mistake with TOEIC Link is to set a hard cutoff at a single score — for example, "all candidates must score at least 18." This is statistically indefensible and creates legal exposure if challenged.

The correct approach is a band-based threshold with an explanation policy.

Band-based threshold structure

Tier	Score Range	Treatment
Clear pass	20+	Auto-qualify on language, proceed to next round
Band match	17-19	Qualify on language with a brief follow-up screen (one-on-one English interview, 10 minutes)
Below band	14-16	Conditional — qualify only if other application criteria are exceptional
Clear miss	<14	Decline on language

The band match tier exists specifically because of the confidence interval. Candidates who score 17-19 may have true abilities that range from 15.5 to 20.5 — overlapping with both the clear-pass and below-band tiers. A 10-minute interview resolves this uncertainty far more cheaply than a re-test.

Explanation policy

Document in your hiring policy that scores within ±2 of any threshold are treated as the same for first-screen purposes. This protects the company against challenges where a candidate scoring 19 vs a candidate scoring 20 are treated differently despite being statistically indistinguishable.

For target-score guidance by job role (which feeds into where to set your bands), see our TOEIC Link target score by job role guide.

When test-takers should retake — and when they should not

Test-takers often retake when they should not, and vice versa. The confidence interval gives you a clean rule.

Do not retake if

Your score is within ±1 of your target. Re-test variance alone is likely to produce the same result, and you will pay for nothing.
Your last two scores are within 1 point of each other. The signal is converged; further variance will average out, not improve.
Less than 4 weeks since your last test, and you have not changed your study routine. The CI captures noise, not growth — repeated tests at the same ability give you the same band.

Do retake if

Your score is 3+ below your target. The CI does not bridge a 3-point gap; you genuinely need to improve.
You completed a structured study program (like our TOEIC Link 15-to-20 roadmap) since the last test. The CI captures measurement noise, but a real ability gain will move the entire band.
You experienced a documented test-day disruption (technical, environmental, health). The CI assumes typical conditions; an atypical session can land in the lower tail of your true band.

For retake-decision logic in detail, see our TOEIC Link retake strategy guide.

Score band conversion to CEFR — preserving the band

A common error is to convert the point score to CEFR and then drop the band. This loses the precision information.

The correct approach is to convert both ends of the band, then take the union of CEFR levels covered.

TOEIC Link score band	CEFR band
8-11 (band 6-13)	A2
12-15 (band 10-17)	B1
16-20 (band 14-22)	B2
21-25 (band 19-25)	C1 / C2

A learner with a TOEIC Link score of 19 (band 17.5-20.5) is best described as "B2, with the upper edge of the band touching B2 ceiling." Reporting this learner as "B2" without the band loses information that matters for placement decisions.

For the full CEFR mapping methodology, see our TOEIC Link CEFR conversion guide.

Why this matters for at-home testing

The TOEIC Link is delivered at home under remote proctoring, which introduces additional variance sources that traditional in-center TOEIC does not have. Network jitter, microphone quality, ambient noise, and brief proctor interventions can all shift a score by 0.5-1 point in either direction.

ETS's CI accounts for typical at-home conditions, but the worst 5% of test-day environments can push a score outside the published CI. This is another reason to treat scores as bands and not point estimates: the at-home CI is wider in practice than the in-center CI from the legacy TOEIC.

For the full list of at-home variance sources and how to minimize them, see our TOEIC Link test environment guide.

Practical takeaways

Every TOEIC Link score is a band, not a point. Use the rule-of-thumb bands above (±1.5 to ±2 across the scale).
For operational decisions, treat differences of 2 or fewer points as noise; only 3+ point gaps are decision-grade.
HR teams should use band-based thresholds with a follow-up screen for the borderline tier.
Test-takers should retake only when their score is 3+ below target or when they have completed structured study since the last test.
When converting to CEFR, preserve the band rather than collapsing to a single level.

Treating scores as bands is not statistical pedantry — it is the difference between defensible decisions and arbitrary ones. Every assessment specialist does this; the goal of this article is to make it standard practice for TOEIC Link users too.