🥸
Ahmad Zaheer Saffi
🛎️

De Moivre-LaPlace Theorem

An elegant alternative to calculating a cumbersome binomial distribution — the efforts of Gauss, De Moivre, and LaPlace leverages an alternate approach that makes life easier.

If we want to flip a coin, we know of its probability → p=1/2. However when running our live experiments, it doesn’t always stay at 1/2 but slightly deviates. The Law of Large Numbers states that with a night enough n=number of experiments, we should gradually refine the actual probability to 1/2.

What if we want to understand the probability of obtaining a certain outcome? For example if we are flipping a coin 10,000 times, how likely is it that you get exactly 5,040 heads? We would use a binomial probability to determine this, which is defined as follows.

P(k heads)=(nk)pk(1−p)n−kP(\text{\it{k} heads})= \binom{n}{k}p^{k}(1−p)^{n−k}P(k heads)=(kn​)pk(1−p)n−k

Approaching this problem by hand (or even a computer) is cumbersome with an nnn as high as 10,000. Much of mathematics is distilling an existing formula into a simpler one (think of the Fourier Series) and this problem is no exception.

Plotting the binomial distribution of 10,000 coin flips would look like the graph below. So if we wanted to produce a simpler equation, we would need to produce this output. The graph takes a familiar shape known as the bell curve which gives a starting point.

image

Our binomial distribution uses discrete increments (no floats) while the bell curve is a continuous function. When you chart the outputs as a histogram, with a sufficiently higher nnn, you achieve a smooth-like function just as you would when distilling a continuous function through derivatives and integrals.

Now, the goal becomes creating a function that fits the normal distribution, a much easier function to work with.

Through trial and error, they landed on the best candidate being the following:

f(x)=e−x2f(x)=e^{−x^{2}}f(x)=e−x2

Which closely mirrors the shape we are looking to achieve.

image
image

The issue is that our probability must adhere to 0≤P(A)≤10\leq{P(A)}\leq{1}0≤P(A)≤1 but this curve’s total area does not equal 1. Essentially we want to find the function that does the following:

∫−∞∞f(x)dx=1\int_{-\infty}^{\infty} f(x)dx = 1∫−∞∞​f(x)dx=1

This involves a scaling exercise where we provide it parameters to fit into a probability distribution — an integral does this as you provide it boundaries and its sole purpose is to produce the area within the curve.

∫−∞∞e−x2dx=?\int_{-\infty}^{\infty} e^{-x^2} dx = ?∫−∞∞​e−x2dx=?

This integral can’t be solved the usual way because you cannot take the anti-derivative of this function. Said differently, there’s no way to distill this into elementary functions using a combination of polynomials, exponentials, logarithms, trig functions, or roots that equals the integral. So the clever idea was to square the integral and turn this into a 2-dimensional problem using the Gaussian integral trick.

We start by defining the integral:

I=∫−∞∞e−x2dxI=\int_{-\infty}^{\infty} e^{-x^2} dxI=∫−∞∞​e−x2dx

Then we square the integral in a manner that produces a double integral.

I2=(∫−∞∞e−x2dx)(∫−∞∞e−y2dy)I^{2}=\left(\int_{-\infty}^{\infty} e^{-x^2} dx\right)\left(\int_{-\infty}^{\infty} e^{-y^2} dy\right)I2=(∫−∞∞​e−x2dx)(∫−∞∞​e−y2dy)I2=∫−∞∞∫−∞∞e−(x2+y2)dx dyI^{2} = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} e^{-(x^2 + y^2)} dx\,dyI2=∫−∞∞​∫−∞∞​e−(x2+y2)dxdy

Above you see that this squared integral measures the area of the curve along the incremental changes across both xxx and yyy axes. The x2+y2x^{2}+y^{2}x2+y2 immediately tells us there’s a circle at play as shown in the following:

image
image

The circle also signals to us the need for using polar coordinates. x2+y2=r2x^{2}+y^{2}=r^{2}x2+y2=r2 which allows us to shift radians, therefore we need to adjust our xxx and yyy coordinates to be in line.

x=rcos⁡θ   →   y=rsin⁡θ   →   dx dy=r dr dθx=r\cos\theta \: \: \: \rightarrow \: \: \: y=r\sin\theta \: \: \: \rightarrow \: \: \: dx \:{dy} = r \: dr\:d\thetax=rcosθ→y=rsinθ→dxdy=rdrdθ

We plug these values back into our double integral and get:

I2=∫02π∫0∞e−r2⋅r dr dθI^{2} = \int_{0}^{2\pi} \int_{0}^{\infty} e^{-r^2}\cdot {r}\: dr \: d\thetaI2=∫02π​∫0∞​e−r2⋅rdrdθ

We start by solving the inner integral first:

∫0∞re−r2dr\int_{0}^{\infty} re^{-r^2}dr∫0∞​re−r2dr

We’ll use u-substitution to simplify this integration, converting r2r^2r2 to u :

u=r2   →   dudr=2r   →   du=2r dru=r^2 \: \: \: \rightarrow \: \: \: \frac{du}{dr}=2r \: \: \: \rightarrow \: \: \: du=2r \: dru=r2→drdu​=2r→du=2rdr

This positions us to swap out r drr \: drrdr and simplify the integration:

r dr=12dur \: dr=\frac{1}{2}durdr=21​du∫0∞e−u⋅12du=12∫0∞e−udu\int_{0}^{\infty} e^{-u}\cdot\frac{1}{2}du=\frac{1}{2}\int_{0}^{\infty} e^{-u}du∫0∞​e−u⋅21​du=21​∫0∞​e−udu

The integral of e−ue^{-u}e−u from 000 to ∞\infty∞ is 111, therefore we have the value 12\frac{1}{2}21​ to plug into the second integral.

I2=∫02π(12)dθ =12⋅2π=πI^{2} = \int_{0}^{2\pi} \left(\frac{1}{2}\right)d\theta\:{=}\frac{1}{2}\cdot{2\pi}=\piI2=∫02π​(21​)dθ=21​⋅2π=πI2=π   →   I=πI^{2}=\pi \: \: \: \rightarrow \: \: \: I=\sqrt{\pi}I2=π→I=π​∫−∞∞e−x2dx=π\int_{-\infty}^{\infty} e^{-x^2}dx=\sqrt{\pi}∫−∞∞​e−x2dx=π​

We’ll re-express this back as a function — to represent this as a proper probability distribution that equates to 111, we divide it by π\sqrt{\pi}π​.

f(x)=1πe−x2f(x)=\frac{1}{\sqrt\pi}e^{-x^2}f(x)=π​1​e−x2

Now that we have the desired curve to fit our binomial distribution and we normalized it to 1 so we can represent it as a proper probability distribution, we need to tune the width of our curve. Picture a knob we adjust to achieve this, using a variance σ2\sigma^2σ2. We generalize the overall width (variance) by using the standard deviation σ\sigmaσ, thereby re-representing the xxx in our function by dividing it by σ\sigmaσ.

x→xσx \rightarrow \frac{x}{\sigma}x→σx​f(x)=1πe−(xσ)2=1πe−x2/σ2f(x)=\frac{1}{\sqrt\pi}e^{{−(\frac{x}{\sigma})}^{2}} = \frac{1}{\sqrt{\pi}}e^{{−x^{2}/{\sigma^2}}}f(x)=π​1​e−(σx​)2=π​1​e−x2/σ2

We need to re-normalize this once more to ensure we stay within the bounds of 111, which means we need to ensure:

∫−∞∞f(x)dx=1\int_{-\infty}^{\infty}f(x)dx=1∫−∞∞​f(x)dx=1

We re-apply the standard deviation, this time to the denominator:

f(x)=1σπe−x2/σ2f(x)= \frac{1}{\sigma\sqrt{\pi}}e^{{−x^{2}/{\sigma^2}}}f(x)=σπ​1​e−x2/σ2

Out of convenience for when we need to derive the formula further, we will take the power of eee and apply 2 to the denominator and square it entirely —thereby transferring the 2 to the denominator as follows:

f(x)=1σπ(e−x2/2σ2)2=1σ2πe−x22σ2f(x)= \frac{1}{\sigma\sqrt{\pi}}\left(e^{{−x^{2}/{2\sigma^2}}}\right)^2 = \frac{1}{\sigma\sqrt{2\pi}}e^\frac{−x^2}{2\sigma^2}f(x)=σπ​1​(e−x2/2σ2)2=σ2π​1​e2σ2−x2​

Our function is normalized and it is also centered at 000, now we need to re-express this function so we can easily shift the curve in any horizontal direction which can be done along the mean μ\muμ. We do this by translating xxx into μ\muμ.

f(x)=1σ2πe−(x−μ)22σ2  →  f(x)=12πσ2e−(x−μ)22σ2f(x)= \frac{1}{\sigma\sqrt{2\pi}}e^\frac{−(x-\mu)^2}{2\sigma^2} \: \: \rightarrow \: \: f(x)= \frac{1}{\sqrt{2\pi\sigma^2}}e^\frac{−(x-\mu)^2}{2\sigma^2}f(x)=σ2π​1​e2σ2−(x−μ)2​→f(x)=2πσ2​1​e2σ2−(x−μ)2​

We’re almost there — this is a probability distribution so we need to adjust some of the variables to account for that original coin-toss experiment we wanted to run.

(nk)pkqn−k≈12πnpqe−(k−np)22npq\binom{n}{k}p^{k}q^{n-k}\approx\frac{1}{\sqrt{{2}\pi{npq}}}e^\frac{-(k-np)^2}{2npq}(kn​)pkqn−k≈2πnpq​1​e2npq−(k−np)2​

You no longer need to take the function to the nnn power which drastically simplifies the calculation of the binomial distribution. It gets at the heart of the Central Limit Theorem and shows that if you average enough random (binary) events, the overall distribution looks normal.

What’s most inspiring behind this formula is the instinct mathematicians have to take a proven, modeled reality and simplify it further by finding its closest analog. Functions like these simplify computational complexity in modeling random processes.