Central Limit Theorem
Central Limit Theorem says that the probability distribution of arithmetic means of different samples taken from the same population will closely resemble a normal distribution. In general, for the central limit theorem to hold, the sample size should be equal to or greater than 30.
A key characteristic of the central limit theorem is that the average of the sample mean and sample standard deviation will approximate the population mean and population standard deviation. In this article, we will learn more about the central limit theorem, its formula, proof, various applications, and examples.
1. | What is Central Limit Theorem? |
2. | Central Limit Theorem Formula |
3. | Central Limit Theorem Proof |
4. | Central Limit Theorem Application |
5. | FAQs on Central Limit Theorem |
What is Central Limit Theorem?
The central limit theorem establishes that if large samples are drawn from a population and their sums are taken then the sums form their own normal distribution. Furthermore, by the law of large numbers, this sum converges to the population mean. The central limit theorem is often abbreviated as CLT.
Central Limit Theorem Definition
The central limit theorem states that irrespective of a random variable's distribution if large enough samples are drawn from the population then the sampling distribution of the mean for that random variable will approximate a normal distribution. This fact holds true for samples that are greater than or equal to 30. In other words, as more large samples are taken, the graph of the sample means starts looking like a normal distribution.
Central Limit Theorem Example
Example: A set of samples have been collected from a larger sample and the sample mean values are 12.8, 10.9, 11.4, 14.2, 12.5, 13.6, 15, 9, 12.6. Find the population mean.
Solution: The given sample mean values are 12.8, 10.9, 11.4, 14.2, 12.5, 13.6, 15, 9, 12.6.
The population mean values are an average of the above sample mean values
\(\mu = \frac{12.8 +10.9 +11.4 + 14.2 +12.5 +13.6 +15 + 9 +12.6}{9}\)
=\(\frac{112}{9} = 12.4\)
Answer: Hence the population mean is 12.4
Central Limit Theorem Formula
Suppose there is a random variable, X, with an unknown or a known probability distribution. Let \(\sigma\) be the standard deviation and \(\mu\) be the mean of X. If a large number of samples are drawn of size n then according to the central limit theorem formula, the new random variable, \(\overline{X}\), consisting of the sample means, will be normally distributed. This is given as:
\(\overline{X}\sim N\left ( \mu,\frac{\sigma}{\sqrt{n}} \right )\)
Thus, the central limit formula says that the random variable of the sample means will be normally distributed with a mean that will be equal to the original distribution and standard deviation given by σ / √n.
Then z score for this random variable, \(\overline{X}\), is given as follows:
\(z = \frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\)
where, \(\overline{x}\) is the mean of \(\overline{X}\)
\(\mu\) and \(\sigma\) are the mean and standard deviation of X respectively.
Central Limit Theorem Proof
Let us suppose we have \(X_{1}\), \(X_{2}\), ... \(X_{n}\) independent and identically distributed random variables with variance \(\sigma\) = 1and mean \(\mu\) = 0.
Let M(t) be the moment generating function of each \(X_{i}\).
M(0) = 1
M'(1) = E[\(X_{i}\)] = \(\mu\) = 0
M''(0) = E[\(X_{i}^{2}\)] = 1
The moment generating function of \(X_{i}\) / √n = \(E\left [ e^{\frac{tX_{i}}{\sqrt{n}}} \right ]\)
By independence the mgf of \(X_{1}\) + \(X_{2}\) + ...+ \(X_{n}\) / √n is \(\left [ M\left ( \frac{t}{\sqrt{n}} \right ) \right ]^{n}\)
Let L(t) = log M(t)
L(0) = log M(0) = 0
L'(0) = M'(0) / M(0) = \(\mu\) / 1 = \(\mu\)
L''(0) = \(\frac{M(0)M''(0)-M'(0)^{2}}{M'(0)^{2}}\) = 1
Using the L'Hospital Rule we find t / √n as t2 / 2
Thus, \(\left [ M(\frac{t}{\sqrt{n}}) \right ]^{2}\) = \(\left [ e^{L\left ( \frac{t}{\sqrt{n}} \right )} \right ]^{n}\)
= \(\left [ e^{nL\left ( \frac{t}{\sqrt{n}} \right )} \right ]\) = \(e^{\frac{t^{2}}{2}}\)
This is the moment generating function of a standard normal distribution, thus proving the central limit theorem.
Central Limit Theorem Application
The central limit theorem is widely used in scenarios where the characteristics of the population have to be identified but analyzing the complete population is difficult. Other applications of the central limit theorem are listed below:
- In data science, the central limit theorem is used to make accurate assumptions of the population in order to build a robust statistical model.
- In applied machine learning, the CLT helps to make inferences about the model performance.
- In statistical hypothesis testing the central limit theorem is used to check if the given sample belongs to a designated population.
Related Articles:
Important Notes on Central Limit Theorem
- The central limit theorem states that if the size of different samples is large enough then the sampling distribution of the means will approximate a normal distribution.
- The sample mean will be the same as the population mean according to the CLT.
- Using the central limit theorem the sample standard deviation is given by (Population standard deviation) / √n.
- The formula for the z score.
Examples on Central Limit Theorem
-
Example 1: In a study, it was reported that the mean of mobile users is 30 years and the standard deviation is 12. Taking a sample size of 100 what is the mean and standard deviation for the sample mean ages of tablet users?
Solution: Since the sample mean will tend to the population mean, thus, mean is 30.
The sample standard deviation is \(\frac{\sigma}{\sqrt{n}}\) = 12 / 10 = 1.2
Answer: Mean = 30, Standard deviation = 1.2
-
Example 2: An unknown distribution has a mean of 80 and a standard deviation of 24. If 36 samples are randomly drawn from this population then using the central limit theorem find the value that is two sample deviations above the expected value.
Solution: We know that mean of the sample equals the mean of the population. Thus, mean = 80.
Standard deviation = \(\frac{\sigma}{\sqrt{n}}\) = 24 / 6 = 4
Thus value is 80 + 2 (4) = 88.
Answer: The value of that is two sample deviations above the expected value is 88.
-
Example 3: Suppose the mean age of people living in a town is 45 years and the standard deviation is 10. What will be the mean and variance of ages for sample sizes 20 and 49?
Solution: When n = 20, the central limit theorem cannot be applied as the sample size needs to be greater than or equal to 30.
When n = 49. The sample mean will be 45.
Sample standard deviation = \(\frac{\sigma}{\sqrt{n}}\) = 10 / 7 = 1.43
Sample variance = 1.432 = 2.045
Answer: a) For n = 49, Mean = 45, Variance = 2.045
FAQs on Central Limit Theorem
What is the Central Limit Theorem in Statistics?
The central limit theorem in statistics states that irrespective of the shape of the population distribution the sampling distribution of the sampling means approximates a normal distribution when the sample size is greater than or equal to 30.
What is the Central Limit Theorem Formula?
The central limit theorem gives a formula for the sample mean and the sample standard deviation when the population mean and standard deviation are known. This is given as follows:
- Sample mean = Population mean = \(\mu\)
- Sample standard deviation = (Population standard deviation) / √n = σ / √n
How Do You Use the Central Limit Theorem?
The following steps can be applied to find a certain probability using the central limit theorem:
- Substitute values in the formula \(z = \frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\).
- Compute this value and find the corresponding z score using the normal distribution table.
- Using this value various probabilities can be calculated. [P (X > x), P(X < x), P(a < X < b)}
What is the Purpose of the Central Limit Theorem?
The central limit theorem helps to approximate the characteristics of a population in cases where it is difficult to gather data about each observation of the population.
Why is the Central Limit Theorem Important for Statistical Inference?
The central limit theorem helps to make important inferences about the population from a sample. It can be used to determine if two samples were drawn from the same population as well as to check if the sample was drawn from a certain population.
Do Confidence Intervals Rely on the Central Limit Theorem?
Confidence intervals do not rely on the central limit theorem, however, the central limit theorem helps to construct confidence intervals by approximating the samples to a normal distribution.
Does Central Limit Theorem to a Non-Normal Distribution?
The central limit theorem can be applied to a sample that has been taken from any type of distribution. It says that the arithmetic means of sufficiently large samples will follow a normal distribution.
visual curriculum