🛎️

De Moivre-LaPlace Theorem

An elegant alternative to calculating a cumbersome binomial distribution — the efforts of Gauss, De Moivre, and LaPlace leverages an alternate approach that makes life easier.

If we want to flip a coin, we know of its probability → p=1/2. However when running our live experiments, it doesn’t always stay at 1/2 but slightly deviates. The Law of Large Numbers states that with a night enough n=number of experiments, we should gradually refine the actual probability to 1/2.

What if we want to understand the probability of obtaining a certain outcome? For example if we are flipping a coin 10,000 times, how likely is it that you get exactly 5,040 heads? We would use a binomial probability to determine this, which is defined as follows.

P(k heads)=(nk)pk(1p)nkP(\text{\it{k} heads})= \binom{n}{k}p^{k}(1−p)^{n−k}

Approaching this problem by hand (or even a computer) is cumbersome with an nn as high as 10,000. Much of mathematics is distilling an existing formula into a simpler one (think of the Fourier Series) and this problem is no exception.

Plotting the binomial distribution of 10,000 coin flips would look like the graph below. So if we wanted to produce a simpler equation, we would need to produce this output. The graph takes a familiar shape known as the bell curve which gives a starting point.

image

Our binomial distribution uses discrete increments (no floats) while the bell curve is a continuous function. When you chart the outputs as a histogram, with a sufficiently higher nn, you achieve a smooth-like function just as you would when distilling a continuous function through derivatives and integrals.

Now, the goal becomes creating a function that fits the normal distribution, a much easier function to work with.

Through trial and error, they landed on the best candidate being the following:

f(x)=ex2f(x)=e^{−x^{2}}

Which closely mirrors the shape we are looking to achieve.

image
image

The issue is that our probability must adhere to 0P(A)10\leq{P(A)}\leq{1} but this curve’s total area does not equal 1. Essentially we want to find the function that does the following:

f(x)dx=1\int_{-\infty}^{\infty} f(x)dx = 1

This involves a scaling exercise where we provide it parameters to fit into a probability distribution — an integral does this as you provide it boundaries and its sole purpose is to produce the area within the curve.

ex2dx=?\int_{-\infty}^{\infty} e^{-x^2} dx = ?

This integral can’t be solved the usual way because you cannot take the anti-derivative of this function. Said differently, there’s no way to distill this into elementary functions using a combination of polynomials, exponentials, logarithms, trig functions, or roots that equals the integral. So the clever idea was to square the integral and turn this into a 2-dimensional problem using the Gaussian integral trick.

We start by defining the integral:

I=ex2dxI=\int_{-\infty}^{\infty} e^{-x^2} dx

Then we square the integral in a manner that produces a double integral.

I2=(ex2dx)(ey2dy)I^{2}=\left(\int_{-\infty}^{\infty} e^{-x^2} dx\right)\left(\int_{-\infty}^{\infty} e^{-y^2} dy\right)I2=e(x2+y2)dxdyI^{2} = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} e^{-(x^2 + y^2)} dx\,dy

Above you see that this squared integral measures the area of the curve along the incremental changes across both xx and yy axes. The x2+y2x^{2}+y^{2} immediately tells us there’s a circle at play as shown in the following:

image
image

The circle also signals to us the need for using polar coordinates. x2+y2=r2x^{2}+y^{2}=r^{2} which allows us to shift radians, therefore we need to adjust our xx and yy coordinates to be in line.

x=rcosθy=rsinθdxdy=rdrdθx=r\cos\theta \: \: \: \rightarrow \: \: \: y=r\sin\theta \: \: \: \rightarrow \: \: \: dx \:{dy} = r \: dr\:d\theta

We plug these values back into our double integral and get:

I2=02π0er2rdrdθI^{2} = \int_{0}^{2\pi} \int_{0}^{\infty} e^{-r^2}\cdot {r}\: dr \: d\theta

We start by solving the inner integral first:

0rer2dr\int_{0}^{\infty} re^{-r^2}dr

We’ll use u-substitution to simplify this integration, converting r2r^2 to u :

u=r2dudr=2rdu=2rdru=r^2 \: \: \: \rightarrow \: \: \: \frac{du}{dr}=2r \: \: \: \rightarrow \: \: \: du=2r \: dr

This positions us to swap out rdrr \: dr and simplify the integration:

rdr=12dur \: dr=\frac{1}{2}du0eu12du=120eudu\int_{0}^{\infty} e^{-u}\cdot\frac{1}{2}du=\frac{1}{2}\int_{0}^{\infty} e^{-u}du

The integral of eue^{-u} from 00 to \infty is 11, therefore we have the value 12\frac{1}{2} to plug into the second integral.

I2=02π(12)dθ=122π=πI^{2} = \int_{0}^{2\pi} \left(\frac{1}{2}\right)d\theta\:{=}\frac{1}{2}\cdot{2\pi}=\piI2=πI=πI^{2}=\pi \: \: \: \rightarrow \: \: \: I=\sqrt{\pi}ex2dx=π\int_{-\infty}^{\infty} e^{-x^2}dx=\sqrt{\pi}

We’ll re-express this back as a function — to represent this as a proper probability distribution that equates to 11, we divide it by π\sqrt{\pi}.

f(x)=1πex2f(x)=\frac{1}{\sqrt\pi}e^{-x^2}

Now that we have the desired curve to fit our binomial distribution and we normalized it to 1 so we can represent it as a proper probability distribution, we need to tune the width of our curve. Picture a knob we adjust to achieve this, using a variance σ2\sigma^2. We generalize the overall width (variance) by using the standard deviation σ\sigma, thereby re-representing the xx in our function by dividing it by σ\sigma.

xxσx \rightarrow \frac{x}{\sigma}f(x)=1πe(xσ)2=1πex2/σ2f(x)=\frac{1}{\sqrt\pi}e^{{−(\frac{x}{\sigma})}^{2}} = \frac{1}{\sqrt{\pi}}e^{{−x^{2}/{\sigma^2}}}

We need to re-normalize this once more to ensure we stay within the bounds of 11, which means we need to ensure:

f(x)dx=1\int_{-\infty}^{\infty}f(x)dx=1

We re-apply the standard deviation, this time to the denominator:

f(x)=1σπex2/σ2f(x)= \frac{1}{\sigma\sqrt{\pi}}e^{{−x^{2}/{\sigma^2}}}

Out of convenience for when we need to derive the formula further, we will take the power of ee and apply 2 to the denominator and square it entirely —thereby transferring the 2 to the denominator as follows:

f(x)=1σπ(ex2/2σ2)2=1σ2πex22σ2f(x)= \frac{1}{\sigma\sqrt{\pi}}\left(e^{{−x^{2}/{2\sigma^2}}}\right)^2 = \frac{1}{\sigma\sqrt{2\pi}}e^\frac{−x^2}{2\sigma^2}

Our function is normalized and it is also centered at 00, now we need to re-express this function so we can easily shift the curve in any horizontal direction which can be done along the mean μ\mu. We do this by translating xx into μ\mu.

f(x)=1σ2πe(xμ)22σ2f(x)=12πσ2e(xμ)22σ2f(x)= \frac{1}{\sigma\sqrt{2\pi}}e^\frac{−(x-\mu)^2}{2\sigma^2} \: \: \rightarrow \: \: f(x)= \frac{1}{\sqrt{2\pi\sigma^2}}e^\frac{−(x-\mu)^2}{2\sigma^2}

We’re almost there — this is a probability distribution so we need to adjust some of the variables to account for that original coin-toss experiment we wanted to run.

(nk)pkqnk12πnpqe(knp)22npq\binom{n}{k}p^{k}q^{n-k}\approx\frac{1}{\sqrt{{2}\pi{npq}}}e^\frac{-(k-np)^2}{2npq}

You no longer need to take the function to the nn power which drastically simplifies the calculation of the binomial distribution. It gets at the heart of the Central Limit Theorem and shows that if you average enough random (binary) events, the overall distribution looks normal.

What’s most inspiring behind this formula is the instinct mathematicians have to take a proven, modeled reality and simplify it further by finding its closest analog. Functions like these simplify computational complexity in modeling random processes.