Statistics 101 — Mean, Variance, Standard Deviation
Mean:
Sample Mean is the mean of sample values collected.
Population Mean is the mean of all the values in the population.
If the sample is random and sample size is large then the sample mean would be a good estimate of the population mean.
A point estimate is the value of a statistic that estimates the value of a parameter. For example, the sample mean is a point estimate of the population mean.
Suppose that you want to find out the average weight of all players on the football team at Landers College. You are able to select ten players at random and weigh them. The mean weight of the sample of players is 198, so that number is your point estimate. Assume that the population standard deviation is σ = 11.50. What is a 90 percent confidence interval for the population weight, if you presume the players’ weights are normally distributed?
This question is the same as asking what weight values correspond to the upper and lower limits of an area of 90 percent in the center of the distribution. The z-scores that correspond to probabilities of 0.05 in either end of the distribution. They are −1.65 and 1.65. You can determine the weights that correspond to these z‐scores using the following formula:
The weight values for the lower and upper ends of the confidence interval are 192 and 204 (see Figure 1). A confidence interval is usually expressed by two values enclosed by parentheses, as in (192, 204). Another way to express the confidence interval is as the point estimate plus or minus a margin of error; in this case, it is 198 ± 6 pounds. You are 90 percent certain that the true population mean of football player weights is between 192 and 204 pounds.
Variance:
Difference between population variance and sample variance
The main difference between population variance and sample variance relates to calculation of variance. Variance is calculated in five steps. First mean is calculated, then we calculate deviations from the mean, and thirdly the deviations are squared, fourthly the squared deviations are summed up and finally this sum is divided by number of items for which the variance is being calculated. Thus variance= Σ(xi-x-)/n. Where xi = ith. Number, x- = mean and n = number of items..
Now, when the variance is to be calculated from population data, n is equal to the number of items. Thus if variance in blood pressure of all the 1000 people is to be calculated from data on blood pressures of all the 1000 people, then n = 1000. However when the variance is calculated from sample data 1 is to be deducted from n before dividing the sum of the squared deviations. Thus in the above example if sample data have 100 items, the denominator would be 100–1 = 99.
Due to this, the value of variance calculated from sample data is higher than the value that could have been found out by using population data. The logic of doing that is to compensate our lack of information about the population data.
SD:
Standard deviation measures the spread of a data distribution. It measures the typical distance between each data point and the mean.
The formula we use for standard deviation depends on whether the data is being considered a population of its own, or the data is a sample representing a larger population.
- If the data is being considered a population on its own, we divide by the number of data points, N.
- If the data is a sample from a larger population, we divide by one fewer than the number of data points in the sample, n-1
The steps in each formula are all the same except for one — we divide by one less than the number of data points when dealing with sample data.
We’ll go through each formula step by step in the examples below.
Here’s how to calculate population standard deviation:
Step 1: Calculate the mean of the data — this is μ in the formula.
Step 2: Subtract the mean from each data point. These differences are called deviations. Data points below the mean will have negative deviations, and data points above the mean will have positive deviations.
Step 3: Square each deviation to make it positive.
Step 4: Add the squared deviations together.
Step 5: Divide the sum by the number of data points in the population. The result is called the variance.
Step 6: Take the square root of the variance to get the standard deviation.