The statistical problem is this: suppose you have a sample average of 60,
based on n=100 observations from a population that has a standard
deviation of
. From the last chapter we know that sample
averages can only get a couple of standard deviations away from
, so
the idea is that
must be either a couple of standard deviations
above 60 or a couple below 60. A standard deviation of x-bar here is one,
so we would expect
to probably be somewhere between about 58 and 62.
We get a range of plausible values for
and this range is called a
confidence interval for
.
The precise formula for calculating a confidence interval for
is:
where
is the sample average,
is the standard deviation
of the population measurements, and n is the sample size. The
is a value from the standard normal table. For example if we
want a 95 percent confidence interval for
, we use
.
Why is this the correct value? Well the correct value of z is found by
trying to capture probability .95 between two symmetric boundaries around
zero in the standard normal curve. This means there is .025 in each tail
and looking up the correct upper boundary with .975 to the left gives 1.96
as the correct value of z from table A. Verify that a 90 percent
confidence interval will use
or 1.65, and a 99 percent
confidence inteval will use 2.57.
Here is an example. The Digest of Education Statistics reports that a
study of 150 4yr colleges and universities had an average tuition and fees
of $16,107 with a
. Give a 95 percent confidence interval
for the national average for tuition and fees. The solution is:
So the plausible range of values for the true average for 4yr tuition and fees is probably somewhere between$15,400 and $16,800.
What does the probably or the 95 percent confidence actually mean? It
means that if we constructed many many 95 percent confidence intervals
from this population, about 95 percent of them would contain the true
. All we can say about our confidence interval is that it was
constructed using a reliable method that contains the truth about 95
percent of the time it is used, so it probably has captured the truth here
also.
A 99 percent confidence interval has a larger multiplier than a 95 percent
confidence interval 2.57 is larger than 1.96 and will thus make a wider
confidence interval which means we are more certain to have captured the
ture
in our intervals. A nice diagram of the confidence interval
reliability interpretation is given in figure 6.2.
If we want to have the margin of error or the half-width of the confidence interval to be m units, how big does the sample size need to be? The answer is given by the sample size formula:
Remember to always round up to the next highest integer, (you can measure a fraction of an observation). The tuition example we did earlier had a margin of error of $678. How many subjects do we need if we want to be within $ 500 of the true average with 95 percent confidence?
Here m=500,
and
(because we want a
95 percent confidence interval). Answer: n=276.38 so round up to 279
observations.