Comp2Regs

Content Disclaimer
Copyright @2020.
All Rights Reserved.

StatsToDo: Compare Two Regressions (Covariance Analysis)

Links : Home Index (Subjects) Contact StatsToDo

MacroPlot Resources
Explanations Javascript Program

Data

Data Entry: Data a table in 3 columns
  - Each row the data from a case
  - Column 1 is a single character group designation (0/1, 1/2, B/G, n/y)
  - Column 2 is the independent or covariate variable (x)
  - Column 3 is the dependent or outcome variable (y)

MacroPlot Code

R Code

R codes for compare 2 regressions using raw data

This page presents 2 meahods for this comparison, using generic R algorithm obtained from CRAN, and a replication of the Javascript program, in R. They both use the same example data set

Data entry using example data

The data is in 3 columns

Col 1 = group, using characters. It can be more than 1, but usually 1 chr will do. In this example it is gender, B for boys and G for girls
Col 2 = the x variable (the covariate). In this example it is gestational age in weeks
Col 3 the y vaiable (the dependent or outcome variable). In this example it is birth weight in grams

# Data entry
myDat = ("
Grp Gage     Bwt
G     37    3048
B     36    2813
G     41    3622
G     36    2706
B     35    2581
B     39    3442
G     40    3453
B     37    3172
G     35    2386
B     39    3555
G     37    3029 
B     37    3185
G     36    2670
B     38    3314
G     41    3596
B     38    3312
G     39    3200
B     41    3667
B     40    3643
G     38    3212
G     38    3135
G     39    3366
           ")
df <- read.table(textConnection(myDat),header=TRUE)      
#df  # optional display of data frame

Program 1: Using Generic R Codes from CRAN

This uses the data set already presented

res1 <- lm(formula = Bwt ~ Grp + Gage + Grp * Gage, data=df)
summary(res1)   # linear model with interaction
res2 <- lm(formula = Bwt ~ Grp + Gage, data=df)
summary(res2)   # linear model with no interaction

The results of the first linear model, with interaction are as follows. The purpose is to evaluate the statistical significane of the interaction (Grp * Gage). If this is significant, then the relationships between gestational age (Gage) and birth weight (Bwt) are different in the two groups. In this result, the interaction is not statistically significant.

> summary(res1)   # linear model with interaction

Call:
lm(formula = Bwt ~ Grp + Gage + Grp * Gage, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-157.20  -65.56   -3.80   85.68  131.87 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -3771.733    683.205  -5.521 3.05e-05 ***
GrpG         -226.830    891.411  -0.254    0.802    
Gage          185.267     17.960  10.315 5.53e-09 ***
GrpG:Gage       1.617     23.411   0.069    0.946    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 98.37 on 18 degrees of freedom
Multiple R-squared:  0.9383,	Adjusted R-squared:  0.9281 
F-statistic: 91.31 on 3 and 18 DF,  p-value: 4.41e-11

Once interaction is found to be not significant, the algorithm can be repeated without the interaction, assuming a common regression. The emphasis here is the coefficients. It shows that

coefficient for GrpG = -165.33, interpreted as girls are 165.33 grams lighter than boys
coefficient for Gage = 186.22, interpreted as growth rate for the two groups combined to be 186.22 grams per week

  
> summary(res2)   # linear model with no interaction

Call:
lm(formula = Bwt ~ Grp + Gage, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-160.055  -64.447   -5.227   86.543  131.153 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -3807.89     427.23  -8.913 3.25e-08 ***
GrpG         -165.33      41.01  -4.031 0.000713 ***
Gage          186.22      11.21  16.605 9.09e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 95.76 on 19 degrees of freedom
Multiple R-squared:  0.9383,	Adjusted R-squared:  0.9318 
F-statistic: 144.5 on 2 and 19 DF,  p-value: 3.205e-12

Program 2: Using Native R duplicating the Javascript Program
Partition of Variance

The following is a single program, divided into parts so its easier to follow. Part 1 is the data entry already shown above. Part 2: Convert input data to series of vectors with 2 elements for 2 groups

  
dfB <- subset(df,Grp=="B")  # data frame with all Bs
dfG <- subset(df,Grp=="G")  # data frame with all Gs
# create vectors
arN <- c(nrow(dfB), nrow(dfG))                  # sample size
arMeanX <- c(mean(dfB$Gage),mean(dfG$Gage))     # mean x
arSdX <- c(sd(dfB$Gage),sd(dfG$Gage))           # SD x
arMeanY <- c(mean(dfB$Bwt),mean(dfG$Bwt))       # mean Y
arSdY <- c(sd(dfB$Bwt),sd(dfG$Bwt))             # SD y
# do regressions
matrix_coef <- summary(lm(dfB$Bwt ~ dfB$Gage))$coefficients
a1 = matrix_coef[1,1]
b1 = matrix_coef[2,1]
matrix_coef <- summary(lm(dfG$Bwt ~ dfG$Gage))$coefficients
a2 = matrix_coef[1,1]
b2 = matrix_coef[2,1]
arB <- c(b1, b2)                                # slope, regression coefficient b
arA <- c(a1,a2)                                 # constant
# Derived parameterrs
arSsx = c(0, 0)           # sum of squares x
arSsy = c(0, 0)           # sum of squares y
arSxy = c(0, 0)           # sum products
arRho = c(0, 0)           # correlation coefficient rho 
arT = c(0, 0)             # student t 
arP = c(0, 0)             # significance p 
for(j in 1:2)
{
  arSsx[j] = arSdX[j]^2 * (arN[j] - 1)
  arSsy[j] = arSdY[j]^2 * (arN[j] - 1)
  arSxy[j] = arB[j] * arSsx[j]
  arRho[j] = arSxy[j] / sqrt(arSsx[j] * arSsy[j])
  arT[j] = arRho[j] * sqrt((arN[j] - 2) / (1 - arRho[j]^2))
  arP[j] = 1 - pt(arT[j], arN[j] - 2)       # 2 tail
}
# initial output of vectors
arN # sample size
arMeanX # mean x
arSdX 
arMeanY # mean Y
arSdY # SD y
# slopes
arB # slope, regression coefficient b
arA # constant a
# correlation coefficients
arRho # correlation coefficient rho 
arT # student t 
arP # significance p

The initial parameters, as vectors are as follows

> # initial output of vectors
> arN # sample size
[1] 10 12
> arMeanX # mean x
[1] 38.00000 38.08333
> arSdX 
[1] 1.825742 1.975225
> arMeanY # mean Y
[1] 3268.400 3118.583
> arSdY # SD y
[1] 351.4447 380.3308
> # slopes
> arB # slope, regression coefficient b
[1] 185.2667 186.8835
> arA # constant a
[1] -3771.733 -3998.563
> # correlation coefficients
> arRho # correlation coefficient rho 
[1] 0.9624533 0.9705682
> arT # student t 
[1] 10.02857 12.74448
> arP # significance p 
[1] 4.154629e-06 8.277119e-08

Part 3: Compare the two xs and ys

diffX = arMeanX[1] - arMeanX[2]
dfX = arN[1] + arN[2] - 2 
pooledX = sqrt(((arN[1]-1)*arSdX[1]^2 + (arN[2]-1)*arSdX[2]^2) / dfX)
seX = pooledX * sqrt(1/arN[1] + 1/arN[2]) 
llX = diffX - 1.96 * seX
ulX = diffX + 1.96 * seX
diffY = arMeanY[1] - arMeanY[2]
dfY = arN[1] + arN[2] - 2; 
pooledY = sqrt(((arN[1]-1)*arSdY[1]^2 + (arN[2]-1)*arSdY[2]^2) / dfY)
seY = pooledY * sqrt(1/arN[1] + 1/arN[2]); 
llY = diffY - 1.96 * seY
ulY = diffY + 1.96 * seY
# output 95% CI difference in x and y
c(llX, ulX)  # 95% CI difference in x
c(llY, ulY)  # 95% CI difference in y

The results are as follows

> # output 95% CI difference in x and y
> c(llX, ulX)  # 95% CI difference in x
[1] -1.685749  1.519082
> c(llY, ulY)  # 95% CI difference in y
[1] -158.6923  458.3256

Part 4: comparing the two slopes

diffSlope = arB[1] - arB[2]                        # diff slope
s2 = ((arSsy[1] - arSxy[1] * arSxy[1] / arSsx[1]) + 
      (arSsy[2] - arSxy[2] * arSxy[2] / arSsx[2])) /
      (arN[1] + arN[2] - 4)
seSlope = sqrt(s2 * (1 / arSsx[1] + 1 / arSsx[2]))  # SE of difference
tSlope = diffSlope / seSlope                        # t test
dfSlope = arN[1] + arN[2] - 4;                      # degrees of freedom
pSlope = (1 - pt(abs(tSlope), dfSlope)) * 2         # Type I error 2 tail
# output comparison 2 slopes
diffSlope # difference between slopes
seSlope   # standard error of difference
tSlope    # t
dfSlope   # deg freedom
pSlope    # significance p (2 tail)

The results are as follows

> # output comparison 2 slopes
> diffSlope # difference between slopes
[1] -1.616828
> seSlope   # standard error of difference
[1] 23.41085
> tSlope    # t
[1] -0.0690632
> dfSlope   # deg freedom
[1] 18
> pSlope    # significance p (2 tail)
[1] 0.9457008

Part 5: Combining the two slopes and compare the two adjusted y variables

commonSlope = (arSxy[1] + arSxy[2]) / (arSsx[1] + arSsx[2])     # common slope
grandMean = (arMeanX[1] * arN[1] + arMeanX[2] * arN[2]) / (arN[1] + arN[2]) # mean of x
adjMean1 =  arMeanY[1] + commonSlope * (grandMean - arMeanX[1])  
adjMean2 =  arMeanY[2] + grandMean * (grandMean - arMeanX[2])
diffMean = arMeanY[1] - arMeanY[2] - commonSlope * (arMeanX[1] - arMeanX[2])  # adjusted diff
s2 = (arSsy[1] + arSsy[2] - (arSxy[1] + arSxy[2]) * (arSxy[1] + arSxy[2]) / (arSsx[1] + arSsx[2])) / 
  (arN[1] + arN[2] - 3);
seMean = s2 * (1.0 / arN[1] + 1.0 / arN[2] + (arMeanX[1] - arMeanX[2]) * (arMeanX[1] - arMeanX[2]) /
                     (arSsx[1] + arSsx[2]));
seMean = sqrt(seMean)                  # SE difference
tMean = diffMean / seMean              # t test
dfMean = arN[1] + arN[2] - 3;          # degrees of freedom
pMean = (1 - pt(tMean, dfMean)) * 2    # p Type I Error (2 tail)
# 95% CI
t = abs(qt(0.025,dfMean))               # t value for p=0.05 2 tail
ll = diffMean - t * seMean             # lower limit 95% CI
ul = diffMean + t * seMean             # upper limit 95% CI
# output of combined data
commonSlope # common slope
diffMean # difference between adjusted means
seMean   # Standard Error of difference
tMean    # t
dfMean   # df
pMean    # significance p 2 tail
c(ll,ul) # 95% confidence interval of adjusted difference in y

The results are as follows

> commonSlope # common slope
[1] 186.2183
> diffMean # difference between adjusted means
[1] 165.3349
> seMean   # Standard Error of difference
[1] 41.0136
> tMean    # t
[1] 4.03122
> dfMean   # df
[1] 19
> pMean    # significance p 2 tail
[1] 0.0007134561
> c(ll,ul) # 95% confidence interval of adjusted difference in y
[1]  79.4924 251.1773

Part 6: plot the dots and regression lines

# Plot
palette(c("red","blue"))        # change palette to B=red G=blue (alphabetical order)
par(pin=c(4.2, 3))              # set plotting window to 4.2x3 inches
plot(x   = df$Gage,             # Gest age on the x axis
     y   = df$Bwt,              # BWt on the y axis
     col = df$Grp,              # in 2 colors (B and G)
     pch = 16,                        # size of dot
     xlab = "Gestation (Gage)",       # x label
     ylab = "Birth Weight (Bwt)")     # y lable
abline(lm(dfB$Bwt~dfB$Gage),col="red")   # regression line for B
abline(lm(dfG$Bwt~dfG$Gage),col="blue")  # regression line for G

The result plot is as shown as follows. In this plot, the palette is assigned red and blue color in alphabeticl order, so that the group B for boys is represented in red, and G for girls in blue. The dots are the data points, and the lines the regression lines for the two groups

StatsToDo: Compare Two Regressions (Covariance Analysis)

Initialize Plotting

Settings for lines

Settings for fills

Settings for fonts

Drawing on the bitmap

Drawing on the plotting area

The quickest and easiest way to draw axis

Other methods of drawing axis

Drawing lines

Drawing bars

Drawing dots

Drawing text

Other miscellaneous drawings

Example used on this page

Method 1: Partition of variance

Method 2: procedure lm from CRAN

References

R codes for compare 2 regressions using raw data

Data entry using example data

Program 1: Using Generic R Codes from CRAN

Program 2: Using Native R duplicating the Javascript Program
Partition of Variance

StatsToDo: Compare Two Regressions (Covariance Analysis)

Initialize Plotting

Settings for lines

Settings for fills

Settings for fonts

Drawing on the bitmap

Drawing on the plotting area

The quickest and easiest way to draw axis

Other methods of drawing axis

Drawing lines

Drawing bars

Drawing dots

Drawing text

Other miscellaneous drawings

Example used on this page

Method 1: Partition of variance

Method 2: procedure lm from CRAN

References

R codes for compare 2 regressions using raw data

Data entry using example data

Program 1: Using Generic R Codes from CRAN

Program 2: Using Native R duplicating the Javascript ProgramPartition of Variance

Program 2: Using Native R duplicating the Javascript Program
Partition of Variance