Statistics 101 — Normal Distribution
The normal distribution is a probability distribution that associates the normal random variable X with a cumulative probability. The normal distribution is defined by the following equation:
where X is a normal random variable, μ is the mean, σ is the standard deviation, π is approximately 3.14159, and e is approximately 2.71828.
The Normal Curve
The graph of the normal distribution depends on two factors — the mean and the standard deviation. The mean of the distribution determines the location of the center of the graph, and the standard deviation determines the height and width of the graph. All normal distributions look like a symmetric, bell-shaped curve, as shown below.
Smaller standard deviation
Bigger standard deviation
Probability and the Normal Curve
The normal distribution is a continuous probability distribution. This has several implications for probability.
- The total area under the normal curve is equal to 1.
- The probability that a normal random variable X equals any particular value is 0.
- The probability that X is greater than a equals the area under the normal curve bounded by a and plus infinity (as indicated by the non-shaded area in the figure below).
- The probability that X is less than a equals the area under the normal curve bounded by a and minus infinity (as indicated by the shaded area in the figure below).
Additionally, every normal curve (regardless of its mean or standard deviation) conforms to the following “rule”.
- About 68% of the area under the curve falls within 1 standard deviation of the mean.
- About 95% of the area under the curve falls within 2 standard deviations of the mean.
- About 99.7% of the area under the curve falls within 3 standard deviations of the mean.
Collectively, these points are known as the empirical rule or the 68–95–99.7 rule. Clearly, given a normal distribution, most outcomes will be within 3 standard deviations of the mean.
Central limit theorem
The central limit theorem states that the sampling distribution of the mean of any independent, random variable will be normal or nearly normal if the sample size is large enough. How large is “large enough”? The answer depends on two factors.
- Requirements for accuracy. The more closely the sampling distribution needs to resemble a normal distribution, the more sample points will be required.
- The shape of the underlying population. The more closely the original population resembles a normal distribution, the fewer sample points will be required.
In practice, some statisticians say that a sample size of 30 is large enough when the population distribution is roughly bell-shaped. Others recommend a sample size of at least 40. But if the original population is distinctly not normal (e.g., is badly skewed, has multiple peaks, and/or has outliers), researchers like the sample size to be even larger.