To evaluate a population parameter is probably the most common research activity
anywhere. It is the basis of market research, political polling, disease prevalence
survey, economic activity, quality of schools, and so on.

Increasingly, the statistics involved are used also in quality control situations, where
data obtained from sampling are matched against a bench mark value. The Quality Statistics Explained Page
will expand on this use of this statistics.

To design a proper survey to evaluate a population parameter involves many
technical issues, such as how to select the right population, what is the most
appropriate measurement or question to ask, how to control bias, and so on.

This site addresses only the single issue of estimating sample size requirement
during the planning stage, and once the data is obtained, to estimate the accuracy
of the results.

**Glossary : **There are a number of terms used in all population parameter
estimations.

**Parameter value ** is what we set out to establish.
**Error ** is the width of uncertainty
**Confidence interval ** is the range within which the true parameter value
is likely to be. It is Value ±Error
**Percent of confidence interval **is the percent of time the results we obtained
will likely to fall within the confidence interval, should we repeated the survey
many time with the same sample size. This represents how sure we can be
that the result is within the confidence interval. The most commonly used is
the 95% confidence interval, which means that 95% of the time we will get a
result within the confidence interval if we were to repeat the survey (or 95% sure).

There are therefore two calculations.

- At the planning stage, we define the confidence interval we required, and
hence the error tolerable, from these we can estimate the sample size required to do the job.
- When all the data has been collected, we know the central value, the variation,
and the sample size, from these we can estimate the error, and hence the
confidence interval of the results we have.

Establishing population means is a frequent research activity, particularly
in the educational, social, and biomedical fields. Educational departments
may wish to know the mathematical abilities of a cohort of school children,
obstetricians need to know the normal birth weight, and so on.

**Glossary **

- The
**mean ** is the mean value of the measurement of interest in the population
- The
**Standard Deviations ** is the Standard Deviation of the measurement in the population.
this is mostly unknown and has to be guesstimated from previous studies or pilot studies.
- The
**error ** is half the **confidence interval **on each side of the mean
**Sample size **is the number of subject needed in the survey to find the
correct result.
**Percent confidence interval **is the percent of time that the true
mean will be within the confidence interval, if the survey is repeated many times.

**Estimate sample size requirement : **
At the planning stage, to estimate
the sample size required, parameters needed
are the percent confidence interval (usually 95%), the standard deviation (SD)
to be expected in the population, and the error we will tolerate (er). The
sample size required to achieve this level of precision can then be estimated.

**An example**

We wish to establish the mean IQ of first year university students. We expect
the standard deviation to be 10, and we want a 95% confidence interval based
on an error of ±2 IQ points (2 / 10 = 0.2 SDs). We can look up the Sample Size for Population Mean Page
and find that the
sample size required is 99 subjects.

**Determine error and confidence interval : **

After data collection, we will know the sample size, and the mean and Standard Deviation of the
measurement of interest. We can then nominate the percent confidence interval
(usually 95%). With these, the error can be estimated, and from the error the
actual confidence interval.

**An example**

We proceeded to measure 97 university student's IQ, and found the mean and
Standard Deviation of IQ in the group measured to be 110 and 12 accordingly.
The error for a 95% confidence interval as calculated from the program in the Sample Size for Population Mean Page
is ±2.4. The 95% CI is therefore 110±2.4, 107.6 to 112.4

Given that the validity of research results depends on adequate sample size, and
sample size requirements depends on a correct estimate of the population Standard Deviation of the measurement of interest,
an accurate estimation of population Standard Deviations is a critical starting point in any research.
Despite the urging of professional statisticians however, it is surprising how infrequent one sees the
precise estimation of Standard Deviations in the medical literature. Often
published Standard Deviations from small samples are used for sample size estimations as if they are
representative.

Only in medical laboratories and in high precision engineering are much
attention given to precise estimation of Standard Deviation. In medical
laboratories, particularly, a precise estimation of Standard Deviation is
necessary in order to establish the normal range of measurements, from which
abnormal values are defined.

**Glossary**

**Error **is measured as a percentage of the true Standard Deviation value
**Confidence interval **depends on the actual Standard Deviation value, from
which the width of the error can be calculated from the percentage error.
**Sample size **is the number of subject needed in the survey to find the
correct result.
**Percent confidence interval **is the percent of time that the true
standard deviation will be within the confidence interval, if the survey is repeated
many times.

**Estimate sample size requirement : **
At the planning stage, to estimate
the sample size required, parameters needed are the percent confidence
interval (usually 95%), and the error tolerable as a percent of the true Standard Deviation

**An example**

We have just started a medical testing laboratory, and prior to establishing
a normal range for blood sugar levels, we wish firstly to establish its Standard Deviation
amongst normal subjects. We suspect the standard deviation to be around 1, and
we want the 95% confidence interval to be ±0.1 (0.1/1 *100 = 10%).
We can look up the Sample Size to Establish Population Standard Deviation Page
and find that the sample size required is 193 samples of blood.

**Determine error and confidence interval : **

After data collection, we will know the sample size, and the Standard Deviation of the
measurement of interest in that sample. We can then nominate the percent confidence interval
(usually 95%). With these, the error can be estimated, and from the error the
actual confidence interval.

**An example**

We proceeded to measure 193 blood sugars, and found the standard deviation of
blood sugar to be 1.2. The error for a 95% confidence interval as calculated from the Sample Size to Establish Population Standard Deviation Page
is ±10%, and
10% of 1.2 is 0.12. The 95% CI of standard deviation of blood sugar is therefore 1.2±0.12, 1.08 to 1.32

Establishing population proportion is a common activity. A politicians may ask what
is the proportion of the population that will vote for his party. A drug company may ask what
is the proportion of a certain age group will have the illness that requires its medicine.
A marketing managers may ask what is the proportion of teenagers that think a particular
clothing style is cool.

**Glossary**

- Although percent is often correctly used, these pages will use the term
**proportion** (prop = numbers positive / total number), a number between 0 (nobody) and
1 (everybody) in order to avoid confusion with the other percent, the 95%
confidence interval.
- The
**error **(er) will also be a proportion, and represent half the range of the confidence
interval. The confidence interval will therefore be prop±er
**Sample size **is the number of subject needed in the survey to find the
correct result.
**Percent confidence interval **is the percent of time that the true
proportion will be within the confidence interval, if the survey is repeated
many times.

There are two common methods of calculations

The first is to transform a proportion into a mean and a standard error that
assumes an underlying **Normal distribution**. This is the most common method used.
However, the assumption of Normal distribution becomes increasingly erroneous
when the proportion concerned is close to the extremes (0 or 1), or when the
sample size is very small. The method becomes unacceptable if the confidence
interval overlaps the ends (prop-er<0 or prop+er>1), or if the sample size is
less than 10. When this happens the more precise calculations based on the
Binomial distribution is required.

The more precise method is to base calculations on the **Binomial distribution **
which truly reflects the behaviour of proportions. It calculates the
probability of having a count of those with positive attributes in a population
of defined size and defined proportion with positive attributes.
As proportions are exponentially distributed, the confidence interval so
calculated is asymmetrical. The calculation also involves repeated estimate
of the binomial coefficient, which consumes much computing time when the sample
size become large.

A common compromise is to use the normal distribution for calculation unless the
sample size is less than 10 or if the confidence interval overlaps 0 or 1,
when the binomial distribution is then used.

**Estimate sample size requirement : **
At the planning stage, to estimate
the sample size required, parameters needed
are the percent confidence interval (usually 95%), roughly the proportion we
expect to find (P), and the error we will tolerate (ER). The sample size
required to achieve this level of precision can then be estimated.

**An example**

We wish to establish the proportion of the population that will vote for the
Labour Party at the next election. We suspect this may be 52% (prop=0.52), and we
want a precision of ± 5% (er=0.05). We want to be able to get the
same result 95% of the time (95% CI). We can look up the Sample Size to Establish Population Proportions Page
and find that the sample size required is 381 subjects.

**Determine error and confidence interval : **

After data collection, we will know the sample size, and the proportion of
positives in that sample. We can then nominate the percent confidence interval
(usually 95%). With these, the error can be estimated, and from the error the
actual confidence interval.

**An example**

We proceeded to ask 381 people whether they will vote labour. At the end of which
we found 150 who said they will (prop=150/381=0.39) The error for a 95% confidence interval
as calculated from the Sample Size to Establish Population Proportions Page
is ±0.049. The 95% CI is therefore 0.39±0.049, 0.341 to 0.439 (34% to 44%)

Sample size for population proportions and population mean

Machin D, Campbell M, Fayers, P, Pinol A (1997) Sample Size Tables for Clinical
Studies. Second Ed. Blackwell Science IBSN 0-86542-870-0 p. 131-135

Sample size for population standard deviation

Greenwood JA and Sandomire MM (1950) Journal of the American Statistical
Association 45 (250) p. 257 - 260

Burnett RW (1975)Accurate estimation of standard deviations for quantitative
methods used in clinical chemistry. Clin. Chem. 21 (13) p. 1935-1938