![]() | Content Disclaimer Copyright @2020. All Rights Reserved. |
Links : Home Index (Subjects) Contact StatsToDo |
Introduction
Javascript Program
R Codes
Tables
This page presents methods for estimating sample size requirements to establish a proportion. This is frequently needed, to establish the frequencies of events, such as death rates, complication rates, success rates, and so on.
The first approach is to assume that binomial distribution is approximately the same as normal distribution, with the Standard Error depending on the proportion and sample size. From this, the sample size required and the error resulted can be calculated according to the normal distribution model.
It is also recognised that the binomial distribution is only similar to the normal distribution when the proportion is near 0.5, as the confidence intervals cannot overlap 0 or 1. This means that the distribution becomes increasingly skewed as the proportion becomes increasingly away from 0.5, and this skewness is pronounced when the sample size is small, as shown in the diagram to the left The second approach is therefore to accept that the binomial distribution is not the same as normal distribution, that is is skewed, narrower towars the ends (0 or 1) and wider towards the middle (0.5). When setting the parameters for calculations, the errors on the two sides of the proportion must be recognised and specified as lower (towards 0) or higher (towards 1). Terminology
When the proportion is not near end extremes (say 0.15 to 0.85), and the sample size is large (say both the number of positive and negative cases exceed 20), results of calculation assuming normal or binomial distribution are similar and the error on on both side are symmetrical. When the proportion concerned is nearer the extremes (<0.15 or >0.85) and the samle size is likely to be small, the more robust binomial distribution should be used References
Machin D, Campbell M, Fayers, P, Pinol A (1997) Sample Size Tables for Clinical Studies. Second Ed. Blackwell Science IBSN 0-86542-870-0 p. 135
This sub-panel provides 2 programs, to estimate sample size required to find a proportion at the planning stage, and to estimate the error for confidence intervals when the data is available.
Program 1: Sample Size for Proportion
Program 2: Error Estimation for Proportion
This sub-panel presents the calculations for sample size and error for proportions, in R Codes.
The algorithms are essentially the same as that in the Javascript program, with minor alterations to comply with the format of R programming. The algorithm is based on the following reference
Section 1. Supportive functions used by all subsequent functionsSection 1.a. global array of log(factorial number) for iterative binomial coefficient. The arLogFact is an array of log factorial number. This is created just once so that repeated testing of binomial coefficients does not require prolonged and repeated calculation of factorial numbers arLogFact <- vector() MakeLogFactArray <- function(n) # create a vector of log(Factorils) { arLogFact <<- vector() # clears array x = 0 arLogFact <<- append(arLogFact,x) for(i in 1:n) { x = x + log(i) arLogFact <<- append(arLogFact,x) } }Section 1.b. functions for sample size and error using normal distribution PtoZ <- function(p) # z value from probability { return (-qnorm(p)) } ZtoP <- function(z) # probability of z { return (pnorm(-z)) } SampleSizeNorm <- function(cf, prop, er) # returns sample size for infinite population large sample { # cf=percent confidence, prop=expected proportion, er=tolerable error za = PtoZ((1.0 - cf / 100.0) / 2.0) return (ceiling( prop * (1.0 - prop) * za * za / ( er * er))) } ErrorNorm <- function(cf, n, prop) # return confidence interval for infinite population large sample { # cf=percent confidence, n=sample size, prop=proportion found za = PtoZ((1.0 - cf / 100.0) / 2.0) return (za * sqrt( prop * (1 - prop) / n)) } Section 2. Functions for calculating error and sample size usingmbinomial distributionSection 2.a. binomial coefficient and probability LogBinomCoeff <- function(n, k) # returns the logarithm of binomial coefficient of n and k { return (arLogFact[n + 1] - arLogFact[k + 1] - arLogFact[n - k + 1]) } p_bin <- function(p,n,k) # probability of observing k positive cases in a sample of n, given the reference probability is p { return (exp(LogBinomCoeff(n,k) + log(p) * k + log(1-p) * (n-k))) }Section 2b. Error using binomial distribution ErrorBinom <- function(c, n, prop, typ) # returns confidence interval for the low end side (formula 6.5) { # typ: 0= low side only, 1= high side , = both low and high alpha = (1.0 - c / 100.0) / 2.0 # alpha in formula 6.5 nPos = prop * n i = 0 p = p_bin(prop, n, i) oldp = p while(p<alpha && i<nPos) # formula 6.5 { i = i + 1 p = p + p_bin(prop, n, i) if(p<=alpha) oldp = p } pp = i / n err = prop - pp pp = i / n errLow = prop - pp if(typ==0) return (errLow) # returns error on low side alpha = 1 - alpha while(p<alpha) # formula 6.5 { i = i + 1 p = p + p_bin(prop, n, i) } pp = i / n errHigh = pp - prop if(typ==1) return (errHigh) return (c(errLow, errHigh)) }Section 2c. Sample size using binomial distribution SampleSizeBinomLow <- function (c, prop, ci) # get sample size low side binomial { ssiz = SampleSizeNorm(c, prop, ci); #initial calculation using normal distribution to get high value ussiz = ssiz * 2 lssiz = round(ssiz / 2) er = ErrorBinom(c, ssiz, prop, 0) while(abs(ussiz-lssiz)>3) { if(er<ci){ussiz = ssiz} else {lssiz = ssiz} ssiz = round((ussiz+lssiz)/2) er = ErrorBinom(c, ssiz, prop, 0) } return (ceiling((ussiz + lssiz) / 2)) } SampleSizBinomHigh <- function(c, prop, ci) # get sample size higher side binomial { ssiz = SampleSizeNorm(c, prop, ci) #initial calculation using normal distribution to get high value ussiz = ssiz * 2 lssiz = round(ssiz / 2) er = ErrorBinom(c, ssiz, prop, 1) while(abs(ussiz-lssiz)>3) { if(er>ci){lssiz = ssiz} else {ussiz = ssiz} ssiz = round((ussiz+lssiz)/2); er = ErrorBinom( c, ssiz, prop, 1) } return (ceiling((ussiz + lssiz) / 2)) } Section 3. Main programs with data I/OSection 3.1. Sample SizemaxSSiz = 1000 # set default minimu, sample size txt = (" Cf Prop Err 90 0.4 0.1 95 0.2 0.05 99 0.1 0.01 ") df <- read.table(textConnection(txt),header=TRUE) #df # optional display of input data frame # extract columns as vectors arCf <- df$Cf # array of % confidence arProp <- df$Prop # array of expected proportions arErr <- df$Err # array of tolerable error # Create result vector arSSNorm <- vector() # array of sample size normal distribution arSSBinLow <- vector() # array of sample size for binomial distribution if error on the lower (towards 0) side arSSBinHigh <- vector() # array of sample size for binomial distribution if error on the higher (towards 1) side for(i in 1:nrow(df)) # First run, calculate sample size normal distribution for each row of data { cf = arCf[i] # % confidence prop = arProp[i] # expected proportion er = arErr[i] # tolerable error sNorm = SampleSizeNorm(cf,prop,er) # sample size normal distribution arSSNorm <- append(arSSNorm,sNorm) # add to array if(sNorm>maxSSiz) maxSSiz = sNorm } maxSSiz = maxSSiz * 2 MakeLogFactArray(maxSSiz) # create a vector of log(Factorils) for(i in 1:nrow(df)) # Second run, calculate sample size binomial distribution for each row of data { cf = arCf[i] # % confidence prop = arProp[i] # expected proportion er = arErr[i] # tolerable error sNorm = arSSNorm[i] if((prop-er)<0) # sample size > that defined at beginning of program { # or error overlaps the lower error limit 0 arSSBinLow <- append(arSSBinLow,"*") } else { arSSBinLow <- append(arSSBinLow,SSizBinomLow(cf, prop, er)) # append sample size to result array } if((prop+er)>1) # sample size > that defined at beginning of program { # or error overlaps the higher limit 1 arSSBinHigh <- append(arSSBinHigh,"*") } else { arSSBinHigh <- append(arSSBinHigh,SSizBinomHigh(cf, prop, er)) # append sample size to result array } } # Incorporate results arrays into original data frame df$SSNorm <-arSSNorm df$SSBinLow <-arSSBinLow df$SSBinHigh <-arSSBinHigh # Display data and results dfThe results are as follows > df Cf Prop Err SSNorm SSBinLow SSBinHigh 1 90 0.4 0.10 65 61 67 2 95 0.2 0.05 246 237 255 3 99 0.1 0.01 5972 5785 6064Interpreting the results
maxSSiz = 1000 # set default minimu, sample size txt = (" Cf SSiz Prop 90 65 0.4 90 61 0.4 95 67 0.2 95 237 0.2 99 5972 0.1 ") df <- read.table(textConnection(txt),header=TRUE) #df # optional display of data frame # extract columns as vectors arCf <- df$Cf # array of % confidence arSSiz <- df$SSiz # array of sample size of data arProp <- df$Prop # array of proportions found # Create result vector arErNorm <- vector() # array of error, normal distribution arErBinLow <- vector() # array of error, binomial distribution, lower (towards 0) side arErBinHigh <- vector() # array of error, binomial distribution, upper (towards 1) side for(i in 1:nrow(df)) # first run error normal distribution and reset max sample size { cf = arCf[i] # % confidence ssiz = arSSiz[i] # sampkle size prop = arProp[i] # proportion arErNorm <- append(arErNorm,ErrorNorm(cf,ssiz,prop)) # append error normal distribution to result array if(ssiz>maxSSiz) # adjust maximum sample size { maxSSiz = ssiz } } maxSSiz = maxSSiz * 2; MakeLogFactArray(maxSSiz) # set up array of log(Factorial) numbers for(i in 1:nrow(df)) # Second run error binomial distribution { cf = arCf[i] # % confidence ssiz = arSSiz[i] # sampkle size prop = arProp[i] # proportion arBinErr <- ErrorBinom(cf, ssiz, prop, 2) arErBinLow <- append(arErBinLow,arBinErr[1]) # append error (lower) arErBinHigh <- append(arErBinHigh,arBinErr[2]) # append error upper } # Incorporate results to original data frame df$ErNorm <-arErNorm df$ErBinLow <-arErBinLow df$ErBinHigh <-arErBinHigh # Display data and results dfThe results are as follows > df Cf SSiz Prop ErNorm ErBinLow ErBinHigh 1 90 65 0.4 0.099948481 0.092307692 0.10769231 2 90 61 0.4 0.103173452 0.104918033 0.10819672 3 95 67 0.2 0.095779084 0.095522388 0.09850746 4 95 237 0.2 0.050925337 0.048101266 0.05316456 5 99 5972 0.1 0.009999503 0.009912927 0.01018084Interpreting the results
This sub-panel provides 4 tables, for sample size to estimate proportions with 80%, 90%, 95%, and 99% confidence intervals. Please note the following conventions
Table 1: Sample size for 80% confidence
Table 2: Sample size for 90% confidence
Table 3: Sample size for 95% confidence
Table 4: Sample size for 99% confidence
|