This page describes the relationship between sample size and error in estimating the
mean of a normally distributed measurement from a sample
Establishing population means is a frequent research activity, particularly in the educational, social, and biomedical fields. Educational departments may wish to know the mathematical abilities of a cohort of school children,
Obstetricians need to know the normal birth weight, and so on.
Terminology:
- Level of confidence in the results. This is expressed as a percentage, the most commonly used one being 95% confidence interval
- Error, either defined by the analyst as tolerable during planning, or estimated from the results at the end of data collection, is the distance between the mean and the end of the confidence interval, so that the actual confidence interval is CI = mean ± error. Error is calculated in Standard Deviation units (z), and the error in term of the measurement ielf is z x SD
- Sample size is the number of observations in the data
This panel provides three calculations
- The first is to estimate the sample size requirement, using confidence level, tolerable error, and assumed Standard Deviation. The input data and results are in actual units of measurements. In the sample size table however, the error/SD ratio us used, so the results are based on SD=1.
An example: We wish to establish the mean IQ of first year university students. We expect the Standard Deviations of IQ in the cohort to be 10, and we want a 95% confidence interval of the results to be ±2 IQ points. This is an error / SD ratio of (z = 2 / 10 = 0.2). The sample size required is 99 subjects.
- The second is to estimate error from the data already collectd. This is based on the confidence level desired, and the sample size and Standard Deviation found in the data. The result is the error in actual measurements, so that the confidence interval is mean ±error
An example: We proceeded to measure 97 university student's IQ, and found the mean and Standard Deviation of IQ in the group measured to be 110 and 12 accordingly. From this we can established that the 95% confidence interval to be ±2.4. The 95% CI is therefore 110±2.4, 107.6 to 112.4
- The third is an exploration of relationship between sample size and error, and is used in pilot studies when the mean and Standard Deviations are not known. The program estimates the error with increasing sample size, using the value of 1 for Standard Deviation, so the results are in Standard Deviation units (z). The results are tabulated, and allows the researcher to determine the optimum sample size for a pilot study
An Example: We would like to know the optimum sample size to be used in a pilot study with a 95% confidence. Examining the results using the Javascript program, we can conclude that a sample size of between 15 and 20 would allow us to obtain a 95% confidence interval of mean ± 0.5SDs, or a sample size of approximately 65 to obtain a 95% confidence interval of mean ± 0.25SDs. We can also conclude that, after the first 20 cases, each increase of 5 cases reduces error by less that 0.1SDs. If the cost of data collection is great, we may decide to use 20 cases in the pilot study. However, if greater precision is a priority, we may use greater numbers.
Please note: Pilot studies obtains preliminary data that are useful during planning. The sample size is therefore an approximation, determined by a balance of need for precision and the cost of data collection. The results of pilot studies therefore cannot be used for hypothesis testing or defining a population parameter. To obtain robust results, the sample size calculation should be used, and the results tested using the error calculations
Reference
Machin D, Campbell M, Fayers, P, Pinol A (1997) Sample Size Tables for Clinical
Studies. Second Ed. Blackwell Science IBSN 0-86542-870-0 p. 131-135
This sub-panel provides 2 tables for samples size and mean values, the first for sample size, the second for pilot studies
Table 1 . Samples Size for Estimating Means
The table consists of 5 columns.
- The first is the tolerable error in Standard Devision units (z=ER/SD).
- The second to the fifth column are the sample size required for estimating population means and errors to 80%, 90%, 95%, and 99% level of Confidence
- The cells contain the sample size required
- For example, to have 95% confidence interval that the mean will fall within the range of ± a fifth of the expected Standard Deviation (z=ER/SD=0.20), the sample size required is 99
z=ER/SD | Sample Size | | z=ER/SD | Sample Size |
| CF=80% | CF=90% | CF=95% | CF=99% | | | CF=80% | CF=90% | CF=95% | CF=99% |
0.02 | 4142 | 6789 | 9620 | 16624 | | 0.04 | 1027 | 1701 | 2404 | 4142 |
0.06 | 459 | 754 | 1066 | 1847 | | 0.08 | 258 | 425 | 602 | 1042 |
0.10 | 166 | 272 | 386 | 666 | | 0.12 | 116 | 190 | 269 | 464 |
0.14 | 86 | 140 | 199 | 342 | | 0.16 | 66 | 108 | 153 | 263 |
0.18 | 52 | 86 | 121 | 209 | | 0.20 | 43 | 70 | 99 | 170 |
0.22 | 36 | 58 | 82 | 141 | | 0.24 | 30 | 49 | 70 | 119 |
0.26 | 26 | 42 | 60 | 102 | | 0.28 | 23 | 37 | 52 | 89 |
0.30 | 20 | 32 | 46 | 78 | | 0.32 | 18 | 29 | 40 | 69 |
0.34 | 16 | 26 | 36 | 62 | | 0.36 | 15 | 23 | 33 | 55 |
0.38 | 13 | 21 | 30 | 50 | | 0.40 | 12 | 19 | 27 | 46 |
0.42 | 11 | 18 | 25 | 42 | | 0.44 | 10 | 16 | 23 | 39 |
0.46 | 10 | 15 | 21 | 36 | | 0.48 | 9 | 14 | 20 | 33 |
0.50 | 9 | 13 | 18 | 31 | | 0.52 | 8 | 12 | 17 | 29 |
0.54 | 8 | 12 | 16 | 27 | | 0.56 | 7 | 11 | 15 | 25 |
0.58 | 7 | 10 | 14 | 24 | | 0.60 | 7 | 10 | 14 | 23 |
0.62 | 6 | 9 | 13 | 22 | | 0.64 | 6 | 9 | 12 | 20 |
0.66 | 6 | 9 | 12 | 20 | | 0.68 | 6 | 8 | 11 | 19 |
0.70 | 5 | 8 | 11 | 18 | | 0.72 | 5 | 8 | 10 | 17 |
0.74 | 5 | 7 | 10 | 16 | | 0.76 | 5 | 7 | 10 | 16 |
0.78 | 5 | 7 | 9 | 15 | | 0.80 | 5 | 7 | 9 | 15 |
0.82 | 4 | 7 | 9 | 14 | | 0.84 | 4 | 6 | 8 | 14 |
0.86 | 4 | 6 | 8 | 13 | | 0.88 | 4 | 6 | 8 | 13 |
0.90 | 4 | 6 | 8 | 12 | | 0.92 | 4 | 6 | 8 | 12 |
0.94 | 4 | 6 | 7 | 12 | | 0.96 | 4 | 5 | 7 | 11 |
0.98 | 4 | 5 | 7 | 11 | |
Table 2 and Plot. Pilot Study to Estimate Means
The table to the left and plot to the right represen the same data in tabulated and graphic formats. They show how the errors of extimation decreases with sample size, and allows the analyst to form an informed decision on the sample size to be used initially in a pilot study, when both the expected mean and Standard Deviations are not known. Please be aware that the y axis of the plot, error, is logarithmic in scale, to make visualization of lower values easier.
SSiz | CF=80% | CF=90% | CF=95% | CF=99% |
5 | 0.69 | 0.95 | 1.24 | 2.06 |
10 | 0.44 | 0.58 | 0.72 | 1.03 |
15 | 0.35 | 0.45 | 0.55 | 0.77 |
20 | 0.30 | 0.39 | 0.47 | 0.64 |
25 | 0.26 | 0.34 | 0.41 | 0.56 |
30 | 0.24 | 0.31 | 0.37 | 0.50 |
35 | 0.22 | 0.29 | 0.34 | 0.46 |
40 | 0.21 | 0.27 | 0.32 | 0.43 |
45 | 0.19 | 0.25 | 0.30 | 0.40 |
50 | 0.18 | 0.24 | 0.28 | 0.38 |
55 | 0.17 | 0.23 | 0.27 | 0.36 |
60 | 0.17 | 0.22 | 0.26 | 0.34 |
65 | 0.16 | 0.21 | 0.25 | 0.33 |
70 | 0.15 | 0.20 | 0.24 | 0.32 |
75 | 0.15 | 0.19 | 0.23 | 0.31 |
80 | 0.14 | 0.19 | 0.22 | 0.30 |
85 | 0.14 | 0.18 | 0.22 | 0.29 |
90 | 0.14 | 0.18 | 0.21 | 0.28 |
95 | 0.13 | 0.17 | 0.20 | 0.27 |
100 | 0.13 | 0.17 | 0.20 | 0.26 |
105 | 0.13 | 0.16 | 0.19 | 0.26 |
110 | 0.12 | 0.16 | 0.19 | 0.25 |
115 | 0.12 | 0.15 | 0.18 | 0.24 |
120 | 0.12 | 0.15 | 0.18 | 0.24 |
125 | 0.12 | 0.15 | 0.18 | 0.23 |
130 | 0.11 | 0.15 | 0.17 | 0.23 |
135 | 0.11 | 0.14 | 0.17 | 0.22 |
140 | 0.11 | 0.14 | 0.17 | 0.22 |
145 | 0.11 | 0.14 | 0.16 | 0.22 |
150 | 0.11 | 0.14 | 0.16 | 0.21 |
155 | 0.10 | 0.13 | 0.16 | 0.21 |
160 | 0.10 | 0.13 | 0.16 | 0.21 |
165 | 0.10 | 0.13 | 0.15 | 0.20 |
170 | 0.10 | 0.13 | 0.15 | 0.20 |
175 | 0.10 | 0.13 | 0.15 | 0.20 |
180 | 0.10 | 0.12 | 0.15 | 0.19 |
185 | 0.09 | 0.12 | 0.15 | 0.19 |
190 | 0.09 | 0.12 | 0.14 | 0.19 |
195 | 0.09 | 0.12 | 0.14 | 0.19 |
200 | 0.09 | 0.12 | 0.14 | 0.18 |
A pilot study is usually conducted in the very beginning of a research project, when the mean and Standard Deviation are not known.
It can be seen from the plot that, depending on the needs of the analyst, the sample size can be from 5 cases (error of z=0.5 SDs and 80% confidence interval) to 165 cases (error of z=0.2 SDs and 99% confidence interval). Given that a pilot study is only to establish some approximate guideline on how to proceed, and not a rigorous hypothesis testing exercise, 30 to 60 cases are common used, allowing a 80% to 99% confidence to detect an approximate level of error at ± 0.25 to 0.5 Standard Deviations
|
This sub-panel shows R codes for programs related to means
Section 1: Common subroutines
PtoZ <- function(p) # probability to z
{
return (-qnorm(p))
}
ZtoP <- function(z) # z to probability
{
return (pnorm(-z))
}
PtoT <- function(p, degFd, tail) # probability to t
{
p = p / tail
return (-qt(p, df=degFd))
}
TtoP <- function(t, degFd, tail) # t to probability
{
p = pt(-t, df=degFd)
return (p * tail)
}
Section 2: Sample Size for Mean
ssmean <- function(c,er,sd) # c=% confidence, er=error, sd = SD
{
c = (1.0 - c / 100.0) # 2 tails for t
ssL = 1
ssR = 1e10
ss = 5000
se = sd / sqrt(ss);
t = PtoT(c,ss-1,2)
e = t * se;
while(abs(e-er)>0.0001 && abs(ssL-ssR)>1)
{
if(e>er){ssL = ss } else {ssR=ss }
ss = round((ssL+ssR)/2)
se = sd/sqrt(ss)
if(ss>5000){t = PtoZ(c/2);} else {t = PtoT(c,ss-1,2);}
e = t * se;
}
return (ss)
}
# Testing
txt = ("
Cf SD Err
90 0.4 0.1
95 1.0 0.5
99 2.2 1.3
")
df <- read.table(textConnection(txt),header=TRUE)
df # optional display of data frame
# extract columns as vectoprs
arCf <- df$Cf
arSD <- df$SD
arErr <- df$Err
# Create result vector
arSSiz <- vector()
for(i in 1:nrow(df))
{
cf = arCf[i]
sd = arSD[i]
er = arErr[i]
arSSiz <- append(arSSiz,ssmean(cf, sd, er)) # append sample size to result array
}
# Incorporatw results to original data frame
df$SSiz <- arSSiz
# show data frame with results
df
Result: SSiz for Cf% confidence interval of mean±Err
> df
Cf SD Err SSiz
1 90 0.4 0.1 46
2 95 1.0 0.5 18
3 99 2.2 1.3 22
Section 3: Program for Error of Means
errmean <- function(c,n,sd) # c=confidence in %, n=ssiz, sd=SD
{
c = (1.0 - c / 100.0) # 2 tails for t
se = sd / sqrt(n)
t = PtoT(c,n-1,2)
return (t * se)
}
# Testing
txt = ("
Cf SSiz SD
90 46 0.4
95 18 1.0
99 22 2.2
")
df <- read.table(textConnection(txt),header=TRUE)
df # optional display of data frame
# extract columns as vectoprs
arCf <- df$Cf
arSSiz <- df$SSiz
arSD <- df$SD
# Create result vector
arEr <- vector()
for(i in 1:nrow(df))
{
cf = arCf[i]
ssiz = arSSiz[i]
sd = arSD[i]
arEr <- append(arEr,errmean(cf,ssiz,sd)) # append sample size to result array
}
# Incorporatw results to original data frame
df$Err <- arErr
# show data frame with results
df
Result: Cf% confidence interval = mean±Err
Cf SSiz SD Err
1 90 46 0.4 0.1
2 95 18 1.0 0.5
3 99 22 2.2 1.3
>
Section 4: Pilot for Means
MeanPilot <- function(conf, intv, maxN)
{
ssiz <- vector()
error <- vector()
Dec <- vector()
PcDec <- vector()
n = intv
i = 1
ssiz <- append(ssiz,n)
error <- append(error, errmean(conf, n, 1))
Dec <- append(Dec,0)
PcDec <- append(PcDec,0)
while(n <= maxN)
{
i = i + 1
n = n + intv
ssiz <- append(ssiz,n)
error <- append(error, errmean(conf, n, 1))
Dec <- append(Dec,error[i-1] - error[i])
PcDec <- append(PcDec,Dec[i] / error[i-1] * 100)
}
mx <- cbind(ssiz, error, Dec, PcDec)
df = as.data.frame(mx)
return(df)
}
# Testing
confidence = 95
interval = 5
maxN = 20
MeanPilot(confidence, interval, maxN)
Results
> MeanPilot(confidence, interval, maxN)
ssiz error Dec PcDec
1 5 1.2416640 0.00000000 0.00000
2 10 0.7153569 0.52630709 42.38724
3 15 0.5537815 0.16157536 22.58668
4 20 0.4680144 0.08576714 15.48754
5 25 0.4127797 0.05523469 11.80192