Content Disclaimer Copyright @2020. All Rights Reserved. |

**Links : **Home
Index (Subjects)
Contact StatsToDo

Explanations and References
The references on this page includes the text book for the Javascript algorithm, and a web based teaching article on multiple regression. Other than these, it is assumed that users coming to this page will have other access to information and advice on the subject. The explanations and discussions that follows are therefore intended to help users follow the procedures and interpret the results, and not meant to be teaching or authoratative in nature.
Javascript Program
## Example used on this page
It should be noted that the data presented here are artificially generated to demonstrate the procedures, and not real. Also that the sample size is deliberately small so they are easier to visualize, that in real life an adequate sample size will be very much larger. See subject index to find estimating sample size for multiple regression.
The example is to develop a model to predict birth weight of babies, using multiple regression. The independent predictive variables are maternal age in years (Mage), maternal height in cms (Mht), gestation age in weeks since beginning of pregnancy (Gest), sex of the baby (Sex, 0 for boys and 1 for girls). The dependent variable is birth weight in grams (Bwt). For this exercise we will use 22 cases. The data is presented in the table to the right. Please note that, the Javascript program on this page allows any number of variables (columns of data), but designate the last column to the right as dependent variable. ## Correlation analysis
The partial correlation coefficient reflects correlation between 2 variables after correcting for correlations with other variables in the data set. A large difference between partial and non-partial coefficients therefore reflect the possibility of excessive overlapping of measurements. ## Multiple Regression
- Equation: y(col 5) = -9164.45(col 1) + 1.70(col 2) + 23.65(col 3) - 223.19(col 4)
- Col 1, Mage, increases Bwt by 1.7g per year, and is not statistically significant (p=0.97)
- Col 2, Mht, increases Bwt by 22.7g per cm, statistically significant at the p<0.1 level
- Col 3, Gest, increases Bwt by 223.2g per week, statistically highly significant at p<0.0001
- Col 4, Sex, decreases Bwt by 209.2g if its a girl, statistically significant at the p<0.05 level
- The equation, as a whole, has a multiple correlation coefficient of R=0.96, statistically highly significant with p<0.0001
## Multiple Standardized RegressionMultiple standardized regression is the same as multiple regression, except that all measurements are standardized to the Standard Deviation unit z, where z = (value - mean) / SD. The coefficients produced are therefore of the same scale, making the structure of relationships between variables easier to visualize.Each partial standardized regression coefficient (β) represents the change in the dependent variabe (y) in number of SDs, for each 1 SD change of the independent variable. The difference between the βs also reflect their relative influence on the dependent variable.
- Equation: y(z
_{5}) = 0.01(z_{1}) + 0.14(z_{2}) + 0.98(z_{3}) - 0.20(z_{4}) - An incresse of 1 SD in maternal age affects 0.01 SD increase in birth weight, a trivial effect
- An incresse of 1 SD in maternal height affects 0.14 SD increase in birth weight, 14 times the effect of maternal age
- An incresse of 1 SD in gestational age affects 0.98 SD increase in birth weight, almost in tandem, 98 times the effect of maternal age and 7 time that of maternal height
- An incresse of 1 SD in Sex affects 0.2 SD decrease in birth weight, 20 times the effect of maternal age, 1.4 times that of maternal height, and 0.2 times (20%) that of gestation
## ReferencesSteel RGD, Torrie JH, Dickey DA (1997) Principles and procedures of statistics. A biomedical approach. 3rd Ed. McGraw-Hill Inc New York NY 10020 ISBN 0-07-061028-2 p. 322-351https://wiki.gis.com/wiki/index.php/Multiple_Regression a detailed explanation of multiple regression available on line
The following is a single program, divided into parts so it is easier to follow
## Part 1. Data entry# Data entry to dataframe myDat = (" Mage Mht Gest Sex Bwt 24 170 37 1 3048 29 161 36 0 2813 29 167 41 1 3622 21 165 36 1 2706 35 168 35 0 2581 27 161 39 0 3442 26 163 40 1 3453 34 167 37 0 3172 25 165 35 1 2386 28 170 39 0 3555 32 167 37 1 3029 31 169 37 0 3185 26 161 36 1 2670 21 165 38 0 3314 21 166 41 1 3596 24 164 38 0 3312 34 169 38 0 3414 25 161 41 0 3667 26 167 40 0 3643 27 162 33 1 1398 27 160 38 1 3135 21 167 39 1 3366 ") df <- read.table(textConnection(myDat),header=TRUE) #summary(df) # optional display of input data ## Part 2: Means and Standard Deviations# mean and SD meanMage = mean(df$Mage) sdMage = sd(df$Mage) meanMht = mean(df$Mht) sdMht = sd(df$Mht) meanGest = mean(df$Gest) sdGest = sd(df$Gest) meanSex = mean(df$Sex) sdSex = sd(df$Sex) meanBwt = mean(df$Bwt) sdBwt = sd(df$Bwt) # show means and Sds c(meanMage,sdMage) c(meanMht,sdMht) c(meanGest,sdGest) c(meanSex,sdSex) c(meanBwt,sdBwt)The means and SD results are as follows > # show means and Sds > c(meanMage,sdMage) [1] 26.954545 4.281491 > c(meanMht,sdMht) [1] 165.227273 3.191235 > c(meanGest,sdGest) [1] 37.772727 2.136571 > c(meanSex,sdSex) [1] 0.5000000 0.5117663 > c(meanBwt,sdBwt) [1] 3113.955 532.697 ## Part 3: Multiple regressionRegRes<-lm(Bwt~Mage+Mht+Gest+Sex,data=df) # Multiple regression summary(RegRes) # show multiple regression resultsThe results are as follos > summary(RegRes) # show multiple regression results Call: lm(formula = Bwt ~ Mage + Mht + Gest + Sex, data = df) Residuals: Min 1Q Median 3Q Max -469.89 -86.47 46.49 84.17 198.44 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -9165.476 1945.069 -4.712 0.000201 *** Mage 1.701 9.864 0.172 0.865124 Mht 23.649 11.724 2.017 0.059759 . Gest 223.194 17.920 12.455 5.68e-10 *** Sex -209.150 77.511 -2.698 0.015228 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 163.7 on 17 degrees of freedom Multiple R-squared: 0.9236, Adjusted R-squared: 0.9056 F-statistic: 51.36 on 4 and 17 DF, p-value: 2.849e-09 ## Part 4: Repeat Multiple regression using Standardized valuesStandardized values z = (value-mean) / SDPart 4a: Create standardized z values # standardization # create z variables df$ZMage <- (df$Mage - meanMage) / sdMage df$ZMht <- (df$Mht - meanMht) / sdMht df$ZGest <- (df$Gest - meanGest) / sdGest df$ZSex <- (df$Sex - meanSex) / sdSex df$ZBwt <- (df$Bwt - meanBwt) / sdBwt Part 4b: Standardized multiple regression using z values RegZRes<-lm(ZBwt~ZMage+ZMht+ZGest+ZSex,data=df) # Multiple regression summary(RegZRes) # show multiple regression resultsThe results are as follows. For all variables mean = 0 and SD = 1 > summary(RegZRes) # show multiple regression results Call: lm(formula = ZBwt ~ ZMage + ZMht + ZGest + ZSex, data = df) Residuals: Min 1Q Median 3Q Max -0.88209 -0.16233 0.08728 0.15800 0.37252 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.267e-16 6.551e-02 0.000 1.0000 ZMage 1.367e-02 7.928e-02 0.172 0.8651 ZMht 1.417e-01 7.024e-02 2.017 0.0598 . ZGest 8.952e-01 7.188e-02 12.455 5.68e-10 *** ZSex -2.009e-01 7.447e-02 -2.698 0.0152 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.3073 on 17 degrees of freedom Multiple R-squared: 0.9236, Adjusted R-squared: 0.9056 F-statistic: 51.36 on 4 and 17 DF, p-value: 2.849e-09To make the coefficients easier to read,it is trans;ated as follows Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0 0.0655 0.0000 1.0000 ZMage 0.0137 0.0793 0.1720 0.8651 ZMht 0.1417 0.0702 2.0170 0.0598 . ZGest 0.8952 0.0719 12.4550 <0.0001 *** ZSex -0.2009 0.0745 -2.6980 0.0152 * |