The proof and intuition presented here come from this excellent writeup by Yuval Filmus, which in turn draws upon ideas in this book by Fumio Hiai and Denes Petz. Suppose that we have a sequence of real-valued random variables
Define the random variable
to be a scaled sum of the first variables in the sequence. Now, we would like to make interesting statements about the sequence
The central limit theorem is quite general. To simplify this exposition, I will make a number of assumptions. First, I will assume that the are independent and identically distributed. Second, I will assume that each has mean and variance . Finally, I will assume that the moment generating function (to be defined below) converges (this condition requires all moments of the distribution to exist).
Under these conditions, the central limit theorem tells us that
where is the normal distribution with density function
and where means that for all intervals . It is not immediately obvious that the sequence should converge, but if it does converge, the normal distribution is the natural candidate. Suppose it converges to some distribution . Now, consider the random variable
Of course, is just , so . But the first term on the RHS is , and the second term on the RHS has the same distribution as , so we have
where the two ‘s are independent random variables with the same distribution. So and must have the same distribution. By grouping terms in different proportions, we can derive similar properties of . Since the normal distribution satisfies
it is a natural candidate for the distribution .
To prove the central limit theorem, we will make use of the moment generating function
and the cumulant generating function
The coefficients of the moment generating function and the cumulant generating function (divided by ) are referred to as “moments” and “cumulants” respectively. The cumulants and the moments are closely related, and the values of one determine the values of the other. Incidentally, the moment generating function and the cumulant generating function of the normal distribution are given by
Note that the moment generating function satisfies
It follows that the cumulant generating function satisfies
Now, we are going to use all of these tools. Let’s inspect the cumulant generating function of . We have
Let be the th coefficient of . Equating powers of above, we get
From the case , we see that
as expected. From the case , we see that
also as expected. But what about higher cumulants? For , as , we have
Therefore, the higher cumulants all vanish. It follows that the cumulants of the sequence converge to the cumulants of . Therefore, the moments of the sequence converge to the moments of . It follows from Levy’s continuity theorem, that , as desired.