Extensive instructions and literature exists for Matched Paired Controlled studies, and the subject will not be covered in detail or with authority on this page. Discussions are primarily to orientate the user on the resources available here. It is assumed that users will already understand how the model is used or having access to expert help on the subject.
This page presents calculations for sample size determination during the planning stage, and estimation of power once data collection is completed.
Matched pair studies consists of selecting an Index case, and match this case with 1 or more Control cases which have similar characteristics.
Matched pair studies can be either prospective or retrospective. Although method of data analysis differs accordingly, the sample size and power estimation for the two models, as presented on this page, are similar
- An example of prospective study is to match a smoker with a number of non-smokers, and follow them to see if the rates of developing lung cancer differ.
- An example of retrospective study is to match a mother who gave birth to a baby with limb defect with 1 or more mothers who had babies with no defect, and compare their exposure to a teratogenic agent during their pregnancies.
Terms used on this page
An
Index case is a case with characteristics being investigated. In the smoking study, an index case is one who smoked, In the limb defect study, an index case is the mother with a defective baby
A Control case is one without the characteristic in question, but is similar to the Index case in as much as possible under the research condition.
a cluster is 1 Index case, plus 1 or more matched Control cases
The ratio of the cluster Ra is the ratio of Control to Index cases, Ra = NC / NI. As each Index case is matched with 1 or more Control cases, Ra is always >= 1
lambda (λ) is the proportion of positives, either expected at the planning stage, or observed once the data is collected
- lambdaC (λC) is the proportion of positives in the Control group
- lambdaI (λI) is the proportion of positives in the Index group
Sample Size (SSiz) in this model represents the number of clusters, each cluster containing 1 Index case plus 1 or more Control case
Alpha (α, p) is the probability of Type I Error, to be used to determine statistical significance
Beta (β) is the probability of Type II Error. Power = 1 - β
Examples
Please note the following examples used artificially generated numbers to demonstrate the statistical procedures, and are not reflective of reality.
Sample size
We think taking a particular drug during pregnancy may cause the baby to develop an unusual malformation, and wish to test this. As both malformation and exposure to the drug are uncommon, and ethical consideration precluded a prospective trial, we used the Retrospective Matched Pair Control Model.
We think that 1% (λI=0.01) of mothers who have malformed babies might have taken this drug, and 0.1% (λC=0.001) of mothers with normal babies might have taken this drug.
For every mother with a malformed baby, we will pick 10 mothers (Ra = NC/NI = 10)
who are demographically similar but have normal babies, and we want to know how many clusters, each with 1 Index and 10 Control cases, we will need.
Using α = 0.05, power = 0.8, λC=0.001, λI=0.01, Ra = 10, the sample size required is 311 clusters, each with 1 mother with malformed babies plus 10 similar mothers with no malformed babies.
Power calculation
In the study of taking a drug during pregnancy and malformed babies, we managed to collect data from 311 clusters, each has 1 mother with malformed baby, and 10 similar mothers with non-malformed babies (311 indexed and 3110 control).
We found 5 of the 311 mothers of the malformed babies group had taken the drug being investigated (&lambada;I = 5 / 311 = 0.16 or 16%), and 4 of the 3110 mothers from the normal babies group took the drug (λC = 4 / 3110 = 0.001 or 0.1%). The power of the data is 0.87, suggesting that it has sufficient power (at the p<0.05 and power >0.8)level to allow confidence interpretation of the results.
We then used the odds Ratio (see subject index) to evaluate the data obtained. The Index group (with malformation) had 5 mothers exposed to the drug (Pos=5) and 306 mothers not so exposed (Neg=306), an odd of 5 / 306 = 0.0163. Control group (without malformation) had 4 mothers exposed to the drug (Pos=4) and 3106 mothers not exposed (Neg=3106), an odd of 4 / 3106 = 0.0013. The odds ratio is 0.0163 / 0.0013 = 12.69, with the 1 tail 95% confidence interval >4.2. In other words, the odd of exposure to the drug in mothers with malformed babies was more than 4.2 times greater than that in mothers with non-malformed babies. We can therefore conclude that delivering a baby with malformation is related to prior exposure to the drug during pregnancy
References
Machin D, Campbell M, Fayers, P, Pinol A (1997) Sample Size Tables for
Clinical Studies. Second Ed. Blackwell Science IBSN 0-86542-870-0 p. 145-146
Program 1: Sample Size
# Sample size program
# data entry
dat = ("
Alpha Power LambdaC LambdaI Ra
0.05 0.8 0.001 0.01 5
0.01 0.8 0.002 0.02 1
")
df <- read.table(textConnection(dat),header=TRUE) # conversion to data frame
# vector to hold sample size
SSiz <- vector() # vector to hold ssiz 1 tail (in clusters of paired cases), 1 tail
# Calculations
for(i in 1 : nrow(df))
{
alpha = df$Alpha[i]
beta = 1 - df$Power[i]
lambdaC = df$LambdaC[i]
lambdaI = df$LambdaI[i]
ra = df$Ra[i]
diff = lambdaI - lambdaC
za = qnorm(alpha)
zb = qnorm(beta)
omega = (lambdaC + diff) / (1 + diff)
bigPi = (lambdaC / (1 + ra)) * (ra + omega / lambdaC)
ps = za * sqrt((1 + 1 / ra) * bigPi * (1 - bigPi)) +
zb * sqrt(lambdaC * (1 - lambdaC) / ra + omega * (1 - omega))
SSiz <- append(SSiz, ceiling(ps^2 / (lambdaC - omega)^2))
}
df$SSiz <- SSiz
df # data grame with input data and result sample size in number of paired clusters
The results are as follows
- Alpha = Probability of Type I Error (p, α)
- Power = 1 - β, where β is probability of Type II Error
- LambdaC (λC) = expected proportion positive in control group
- LambdaI (λI)= expected proportion positive in indexed group
- Ra = ratio of cases between control and index groups Ra = NC / NI
- SSiz = number of paired clusters rquired (e.g. 381 clusters of 5 control + 1 indexed cases from row 1), 1 tail
> df # data grame with input data and result sample size in number of paired clusters
Alpha Power LambdaC LambdaI Ra SSiz
1 0.05 0.8 0.001 0.01 5 381
2 0.01 0.8 0.002 0.02 1 689
Program 2: Power
# Power Estimation
# data entry
dat = ("
Alpha SSiz LambdaC LambdaI Ra
0.05 381 0.001 0.01 5
0.01 689 0.002 0.02 1
")
df <- read.table(textConnection(dat),header=TRUE) # conversion to data frame
# vector to receive result (Power 1 tail)
Power <- vector() # array to Power
for(i in 1 : nrow(df))
{
alpha = df$Alpha[i]
n = df$SSiz[i]
lambdaC = df$LambdaC[i]
lambdaI = df$LambdaI[i]
ra = df$Ra[i]
diff = lambdaI - lambdaC
za = abs(qnorm(alpha))
omega = (lambdaC + diff) / (1.0 + diff)
bigPi = (lambdaC / (1.0 + ra)) * (ra + omega / lambdaC)
zb = (sqrt(n) * abs(lambdaC - omega) -
za * sqrt((1.0 + 1.0 / ra) * bigPi * (1.0 - bigPi))) /
sqrt(lambdaC * (1.0 - lambdaC) / ra + omega * (1.0 - omega))
Power <- append(Power,pnorm(zb))
}
Add vector to data frame for display
df$Power <- Power
df # display input data and Power (1 tail)
The results are as follows
- Alpha = probability of Type I Error used to determine statistical significance (p. α)
- SSiz = the number of paired clusters (NC + NI) in the data
- LambdaC (λC) is the proportion of positives in the control cases
- LambdaI (λI) is the proportion of positives in the indexed cases
- Ra is the ratio of control over indexed cases in each cluster (RA = NC / NI)
- Power = 1 - β where β is the probability of Type II Error (1 tail)
> df # display input data and Power (1 tail)
Alpha SSiz LambdaC LambdaI Ra Power
1 0.05 381 0.001 0.01 5 0.8000149
2 0.01 689 0.002 0.02 1 0.8001534