Strategic Thinking: Bayesian Inference — Updating Your Probability Estimates with New Evidence

Thank you for visiting this site. This article explains Bayesian Inference.

Every time new information arrives, update your probability estimate to reflect it. Tracing back to the writings of eighteenth-century clergyman Thomas Bayes, this way of thinking is now widely used in modern machine learning, medical diagnosis, spam filtering, and scientific reasoning.

Diagram

Bayes and the History of His Theorem

The theorem’s namesake is Thomas Bayes (1702–1761). An English Nonconformist minister with a deep interest in mathematics and philosophy, he grappled with the question “what is probability?”

Bayes never published his findings in his lifetime. In 1763, his friend Richard Price discovered the manuscript and published it in the Philosophical Transactions of the Royal Society — “An Essay towards solving a Problem in the Doctrine of Chances.”

At the time, the paper attracted little attention. Around the same period, Pierre-Simon Laplace independently derived a similar theorem, and in France it was long known as “Laplace’s theorem.” Bayes’ name became widely recognized only in the late twentieth century, when the spread of computers made Bayesian computation practical.

Deriving Bayes’ Theorem

Bayes’ theorem can be derived from the basic properties of conditional probability.

The joint probability P(A,B) of events A and B can be expressed in two ways:

P(A,B) = P(A|B) × P(B) (probability of A given B × probability of B)

P(A,B) = P(B|A) × P(A) (probability of B given A × probability of A)

Setting the two right-hand sides equal and solving for P(A|B):

P(A|B) = P(B|A) × P(A) / P(B)

This is Bayes’ theorem.

In the context of inference:

P(A): Prior probability — degree of belief in hypothesis A before seeing evidence
P(B|A): Likelihood — probability of observing evidence B if hypothesis A is true
P(B): Marginal probability — probability of observing evidence B (a weighted average across all hypotheses)
P(A|B): Posterior probability — degree of belief in hypothesis A after seeing evidence B

This can be expressed concisely as: Posterior ∝ Prior × Likelihood (∝ means “proportional to”).

Medical Diagnosis: A Concrete Example

The classic example where Bayes’ theorem produces counterintuitive results is medical diagnosis. Let us work through the numbers.

Assumptions:

Disease prevalence (fraction of the population that has the disease): 1%
Test sensitivity (probability a diseased person tests positive): 95%
Test specificity (probability a healthy person tests negative): 95%

If a test comes back positive, what is the probability that the person actually has the disease (positive predictive value)?

Most people answer “90–95%, since the test is 95% accurate.” Let us calculate the correct answer.

For a population of 10,000 people:

People with the disease: 100 (1% prevalence)
- Positive (true positive): 100 × 0.95 = 95
- Negative (false negative): 5
Healthy people: 9,900
- Negative (true negative): 9,900 × 0.95 = 9,405
- Positive (false positive): 9,900 × 0.05 = 495

Total positives: 95 + 495 = 590 Of those, actually diseased: 95

Positive predictive value = 95 / 590 ≈ 16%

Even with a 95% accurate test, a positive result means only about a 16% chance of actually having the disease.

Why? Because disease prevalence (the prior probability) is low (1%), the vast majority of the population is healthy. Even a 5% false positive rate across 9,900 healthy people generates 495 false positives, which swamps the 95 true positives.

The lesson: when the prior probability (prevalence, base rate) is low, even a highly accurate test produces mostly false positives.

This example connects directly to the cognitive bias of “base rate neglect” — humans tend to ignore prior probabilities and focus only on the test result or immediate evidence.

Strategic Thinking: Behavioral Economics — Understanding How Humans Systematically Deviate from Rationalityen.senkohome.com/strategic-thinking-behavioral-economics/

Bayesian Updating with Multiple Pieces of Evidence

A powerful property of Bayesian inference is sequential updating: each time new evidence arrives, “the posterior becomes the next prior.”

Consider diagnosing whether a factory machine is broken.

Initial state: Prior probability = “10% chance of being broken.”

First evidence: An unusual noise is detected. Probability of noise given a working machine: 5%; probability of noise given a broken machine: 80%.

Bayesian update:

∝ 0.10 × 0.80 (broken × noise likelihood) = 0.080
∝ 0.90 × 0.05 (working × noise likelihood) = 0.045
Total: 0.125
Posterior probability of broken: 0.080 / 0.125 ≈ 64%

The noise evidence jumps the probability of being broken from 10% to 64%.

Second evidence: Vibration is measured as within normal range. Probability of normal vibration given a working machine: 90%; probability of normal vibration given a broken machine: 30%.

Bayesian update (prior is now 64%):

∝ 0.64 × 0.30 (broken × normal vibration likelihood) = 0.192
∝ 0.36 × 0.90 (working × normal vibration likelihood) = 0.324
Total: 0.516
Posterior probability of broken: 0.192 / 0.516 ≈ 37%

The normal vibration reading brings the probability back down from 64% to 37%.

In this way, Bayesian inference’s learning process — updating the probability each time new evidence arrives, converging toward an accurate judgment — unfolds step by step.

Frequentism vs. Bayesianism

Statistics has two major schools of thought.

Frequentism: Defines probability as “the limiting frequency of an event in infinitely many repetitions of the same experiment.”

“The probability of heads is 0.5” is defined by the ratio if the same coin were flipped infinitely
Parameters (e.g., population mean) are fixed true values; data are random variables
Main inference tools: hypothesis testing (p-values), confidence intervals

Bayesianism: Defines probability as “the degree of belief that a proposition is true.”

Parameters themselves have probability distributions
Set a prior distribution, observe data, update to a posterior distribution
Main inference tools: Bayes factors, credible intervals

Key differences:

	Frequentism	Bayesianism
Definition of probability	Long-run frequency	Degree of belief
Parameters	Fixed true values	Have probability distributions
Prior information	Not used	Explicitly incorporated as prior distribution
Interval interpretation	Confidence interval (contains true value 95% of the time in the long run)	Credible interval (95% posterior probability)
Probability of a one-time event	Not definable	Definable

Weather forecasts like “40% chance of rain tomorrow” are Bayesian. “Tomorrow’s weather” happens only once, making it hard to define its probability in frequentist terms; the Bayesian interpretation is “degree of belief that it rains given the current atmospheric data.”

Spam Filters and Naïve Bayes

Email spam filters are one of the most widespread practical applications of Bayesian inference.

The method known as the Naïve Bayes Classifier works as follows.

For each word, the classifier learns in advance “the probability of this word appearing in a spam email” and “the probability of appearing in a legitimate email.”

When a new email arrives, every word’s occurrence is treated as an independent piece of evidence (the “naïve” = independence assumption), and Bayesian updates are applied repeatedly to calculate “the probability that this email is spam.”

“Free,” “now,” “winner,” “click” appear more often in spam → raise the spam probability
“Meeting,” “report,” “best regards” appear more often in legitimate mail → lower the spam probability

Because Naïve Bayes is computationally simple and easy to implement, it remains a standard algorithm in many machine learning tasks including text classification, sentiment analysis, and medical diagnosis.

Bayesian Thinking and Cognitive Biases

Bayesian inference provides a framework for identifying cognitive biases that humans frequently exhibit.

Base Rate Neglect: The tendency to ignore prior probabilities (base rates) and judge solely from immediate evidence. As shown in the medical diagnosis example, failing to consider disease prevalence leads to drastic misinterpretation of a positive test.

Confirmation Bias: The tendency to actively seek evidence that supports one’s existing beliefs (assigning high likelihood to it) while ignoring or downplaying contradictory evidence. In Bayesian terms, all evidence — including that which challenges your hypothesis — should be honestly reflected in the likelihood.

Anchoring Bias: Initial information becomes fixed like a prior and is not updated sufficiently in response to subsequent evidence.

Hindsight Bias: The tendency after the fact to think “I knew it all along.” In memory, the probability distribution held before the outcome was known is retroactively rewritten as “that was the right answer.”

All of these biases can be explained as failures of correct Bayesian updating: ignoring the prior probability, distorting the likelihood, or updating insufficiently.

Practical Bayesian Decision-Making

A practical procedure for applying Bayesian inference to decision-making:

Step 1 — Set the prior probability: Explicitly quantify “how likely is this hypothesis (candidate, cause, risk) to occur?” Past data, industry statistics, and expert knowledge can inform this.

Step 2 — Assess the likelihood: Evaluate “if this hypothesis is true, how likely is the observed evidence?” Test sensitivity, forecast model accuracy, and expert testimony reliability are all likelihoods.

Step 3 — Calculate (or approximate) the posterior probability: Estimate the posterior from the relative magnitude of prior × likelihood. Even without precise calculation, comparing “hypothesis A prior × likelihood” to “hypothesis B prior × likelihood” enables relative judgments.

Step 4 — Collect additional evidence and update: Do not treat the posterior as fixed. Update every time new evidence arrives. “Once decided, don’t change” is not Bayesian; “update incrementally as evidence accumulates” is the rational attitude.

Step 5 — Set a decision threshold: Be aware of the threshold “at what probability do I take action?” In medicine: “run confirmatory tests if positive probability exceeds 50%.” In investment: “proceed if probability of expected return exceeding cost exceeds 60%.”

Summary

This article explained Bayesian Inference. We hope it was useful.

Bayesian inference is not merely a statistical technique; it is a framework for rational epistemology — “when new evidence arrives, update your beliefs to reflect it.” The process of explicitly holding a prior, updating it each time evidence is observed, and expressing the final belief as a posterior probability is effective in every domain of science, medicine, and decision-making.

While frequentist hypothesis testing asks “could the data have arisen by chance?”, Bayesian inference asks directly “how much should we believe the hypothesis?” This difference in framing is the source of Bayesian inference’s practical usefulness for decision-making.

Bayesian inference also provides the foundation for rational belief updating in situations of information asymmetry — negotiations, insurance, or auctions where the other party’s type is unknown.