Day99: The Bayesian & the Frequentist

Posted by csiu on June 3, 2017 | with: 100daysofcode

Today I started reading “Constraints versus Priors”, a paper pointing out differences between Bayesian and frequentist approaches to quantifying uncertainty.

Stark P (2015) Constraints versus Priors. SIAM/ASA J. Uncertainty Quantification, 3: 586–598.

This paper is turning out to be a delightful read and so I thought I’ll point out a few excerpts that I thought were interesting.

Terminology

For a deeper understanding of a paper, I looked up a few (unfamiliar) concepts.

  • Hilbert space - an infinite-dimensional analog of Euclidean space
  • norm - a function that assigns a strictly positive length or size to each vector in a vector space
  • Laplace’s Principle of Insufficient Reason - If there is no reason to believe that outcomes are not equally likely, assume that they are equally likely.


Use of constraints

According to the paper, constraints assert that a parameter (describing the state of the world) is bounded by a constant. The frequentist and the Bayesian use constraints differently:

The frequentist uses constraints directly by restricting the set of parameters that satisfy the constraints.

whereas

The Bayesian chooses a prior probability distribution that assigns probability 1 to the set of parameters that satisfy the constraint.


The Bayesian approach

The Bayesian approach starts with (1) a prior probability distribution on the parameters and (2) a likelihood function. The information in the prior and the data is contained in the posterior.

The author notes there are a number of arguments for the Bayesian approach:

  1. People are in fact Bayesian (false)
  2. People should be Bayesian (else “incoherent”)
  3. The choice of the prior does not matter (much), because the data eventually overwhelm the prior

The last argument: “choice of the prior does not matter”, is saying that if we have enough data then it doesn’t matter what we do. The sceptic in me wants to know: how to tell when we have enough data?

Another insightful comment I found was:

[T]he fact that you can compute something does not automatically make the answer meaningful, relevant, or useful: The world is full of “quantifauxcation,” which consists of assigning a meaningless number to something, then concluding that because the result is quantitative, it must be meaningful.

For instance, let say I give a person a 6, divided that by 2, and so it is now 3. What does this mean? Not much.

According to the paper, in Bayesian statistics the prior probabilities quantify the degree of belief.

I have never seen a Bayesian analysis of real data in which the data analyst made a serious attempt to quantify her beliefs using a prior.

Say what?!

Two analysts can have very different priors and both be right

The relationship between the prior and the world has no bearing on whether I am right or wrong

… combined with “choice of the prior does not matter”, the frequentist approach is looking pretty sweet.

The frequentist approach

In the frequentist approach, probabilities are used to describe long-term regularities in repeated trials.

Defining “essentially identical” is a serious problem

… nothing can be perfect *sigh*


In summary:

A Bayesian can give a number for the probability that a particular hypothesis is true; a frequentist thinks hypotheses are either true or false