This page describes the relationship between sample size and error in estimating
Standard Deviations (SD) from a sample
As most statistical calculations are based on the critical assumption that the SD of a measurement is known and valid, the precision of an estimated SD is therefore important.
In many cases in the clinical situation, an assumed value for SD is used, based on published figures or from estimates using a small pilot study, but this often results in misleading conclusions
In particular, in engineering and in high precision biochemical laboratories, a valid assumed SD is critical, and the calculations presented in this panel is one of the methods used to obtain this.
The error, the confidence interval of the SD estimated, is expressed as a percent of the value. The greater the sample size, the narrower would be this confidence interval.
Two calculations are offered in this panel
- At the planning stage of the study, to estimate the sample size required to establish a desired confidence interval.
- After collecting all the data, to estimate the SD and its confidence interval
The other sub-panels for SD are
- Javascript program to calculate sample size, and to estimate confidence intervals
- R codes to do the same
- Tables of sample size and confidence intervals in the commonly used range of values
References
Greenwood JA and Sandomire MM (1950) Journal of the American Statistical
Association 45 (250) p. 257 - 260
Burnett RW (1975)Accurate estimation of standard deviations for quantitative
methods used in clinical chemistry. Clin. Chem. 21 (13) p. 1935-1938
This sub-panel provides the R codes for estimating sample size needed and error estimated with Standard Deviation, algorithm is as described in
Burnett RW (1975) Accurate estimation of standard deviations for quantitative methods used in clinical chemistry. Clin. Chem. 21 (13) p. 1935-1938
Section 1. Supportive subroutines
ChiSqToP<-function(chi,degF) #function to calculate probability from chisq and df
{
return (1 - pchisq(chi, df=degF)) #probability
}
Confidence <- function (u,df) # common to both ssiz and error
{
#x1 = pow(1 - u,2) * df
#x2 = pow(1 + u,2) * df
x1 = (1 - u)^2 * df
x2 = (1 + u)^2 * df
return (ChiSqToP(x1,df) - ChiSqToP(x2,df));
}
# Calculate sample size from confidence and % error
SSizSD <- function (Cf,Er) # percent confidence, tolerable error (as % of SD found)
{
Cf = Cf / 100; # converts to probability
Er = Er / 100; # converts to probability
# iterate for correct sample size
nl = 0;
nr = 100000000;
nm = nr / 2;
p = Confidence(Er,round(nm));
while(abs(nl-nr)>1)
{
if(p<Cf) { nl = nm; } else { nr = nm; }
nm = (nr + nl) / 2.0;
p = Confidence(Er,round(nm));
}
#print(ceiling(nm) + 1)
return (ceiling(nm) + 1); # nm is df so sample size needs plus 1
}
# Calculate Error from sample size (n) and % confidence
ErrSD <- function(cf,ssiz) # n= sample size, Cf = percent confidence
{
cf = cf / 100; # converts to probability
df = ssiz-1; # degrees of confidence
# iterate for error
ul = 0;
ur = 1;
um = 0.5;
p = Confidence(um,df);
while(abs(ul-ur)>0.0001)
{
if(p>cf) { ur = um; } else { ul = um; }
um = (ul + ur) / 2.0;
p = Confidence(um,df);
}
return (um * 100);
}
Section 2. Main Programs
Propgram 1. Sample size
txt = ("
Cf Err
90 10
95 5
99 1
") # input data Cf=% confidence Err= error in % of SD
df <- read.table(textConnection(txt),header=TRUE)
#df # optional display of input data frame
# extract arrays from data frame
arCf <- df$Cf
arErr <- df$Err
# Create results array
arSSiz <- vector()
# Calculate
for(i in 1:nrow(df))
{
cf = arCf[i]
er = arErr[i]
arSSiz <- append(arSSiz,SSizSD(cf, er)) # append sample size to result array
}
# Incorporatw results to original data frame
df$SSiz <- arSSiz
# show data frame with results
df
The results are as follows
> df
Cf Err SSiz
1 90 10 137
2 95 5 769
3 99 1 33175
Propgram 2. Error as % of SD found
txt = ("
Cf SSiz
90 137
95 769
99 33175
") # input data Cf=% confidence, SSiz = sample size
df <- read.table(textConnection(txt),header=TRUE)
#df # optional display of input data frame
# extract arrays from data frame
arCf <- df$Cf
arSSiz <- df$SSiz
# Create results array
arEr <- vector()
# Calculate
for(i in 1:nrow(df))
{
cf = arCf[i]
ssiz = arSSiz[i]
print(c(cf,ssiz))
arEr <- append(arEr,ErrSD(cf, ssiz)) # append error to result array
}
# Incorporatw results to original data frame
df$Error <- arEr
# show data frame with results
df
The results are
> df
Cf SSiz Error
1 90 137 9.9700928
2 95 769 5.0018311
3 99 33175 0.9979248