Paradoxes

Simpson's Paradox — Winning Every Group but Losing Overall

Simpson's Paradox — Winning Every Group but Losing Overall

Thank you for visiting this site. This article covers “Simpson’s Paradox.”

The world of data hides a frightening trap. Divide a population into two groups and compare them, and A may look superior. Aggregate all the data together and B comes out on top. What holds within each subgroup need not hold in the combined total. This is Simpson’s Paradox.

Diagram

A Concrete Example

Suppose a university is accused of gender bias in admissions.

Engineering School

  • Men: 800 applicants, 480 admitted (60% acceptance rate)
  • Women: 100 applicants, 70 admitted (70% acceptance rate)

Liberal Arts School

  • Men: 200 applicants, 40 admitted (20% acceptance rate)
  • Women: 900 applicants, 270 admitted (30% acceptance rate)

Looking at each school individually: women have a higher acceptance rate in both schools.

But combine all applicants:

  • All men: 1,000 applicants, 520 admitted (52% acceptance rate)
  • All women: 1,000 applicants, 340 admitted (34% acceptance rate)

Wait — men have a higher overall acceptance rate!

Women outperform men in every individual school, yet men lead overall. The numbers do not lie; the conclusion simply reverses. This is Simpson’s Paradox.

Why the Reversal Happens

The key is the distribution of applicants across schools.

In this example, men apply predominantly to the higher-acceptance Engineering School (800 applicants), while women apply predominantly to the lower-acceptance Liberal Arts School (900 applicants).

Men concentrate in the easier school; women concentrate in the harder school. Even though women outperform within each school, the overall acceptance rate is heavily influenced by which school each group applies to.

The hidden third variable (in this case, choice of school) is called a confounding variable. Ignore the confounding variable and aggregate naively, and Simpson’s Paradox appears.

A Real Historical Case

This is not a desk-bound abstraction. Simpson’s Paradox has caused real-world confusion.

In 1973, the overall admission rate at UC Berkeley was 44% for men and 35% for women, raising suspicions of gender discrimination. But a department-by-department analysis found that in most departments women’s admission rates equaled or exceeded men’s.

The explanation: women disproportionately applied to highly competitive departments with low acceptance rates. This real example is now a standard illustration in statistics textbooks.

Lessons for Data Analysis

The most important lesson Simpson’s Paradox teaches: the conclusion can change depending on which level of aggregation you examine.

Looking only at overall figures can lead to conclusions that are the exact opposite of the truth. But subdividing too finely produces sample sizes too small for statistical reliability.

The key habit is always asking: “Is there a hidden confounding variable in this data?” Blindly trusting simple aggregate statistics is especially dangerous in business decision-making and policy.

In medicine, Simpson’s Paradox frequently arises when evaluating treatments. If severe and mild patients are assigned to treatments in unequal proportions, the combined analysis can make a good treatment look bad. This is one reason why randomized controlled trials (RCTs) are the gold standard in medical research: randomization balances confounders across groups.

Summary

This article covered “Simpson’s Paradox.”

Statistics is a powerful tool when used correctly, but can become a weapon for disinformation when misused. Whenever you read data, the habit of pausing to ask “is this the right way to aggregate?” is worth cultivating — and this paradox shows exactly why.

To return to the full list of paradoxes, follow the link below.

Thank you for reading. We hope to see you in the next article.

World's Paradoxes — The Complete List: Philosophy, Math, Physics & Economicsen.senkohome.com/paradox-list/