Content Disclaimer Copyright @2020. All Rights Reserved. |
Links : Home Index (Subjects) Contact StatsToDo
Explanations and References Currently, the multivariate Logistic Regressions (binomial, multinomial, or ordinal) are used to establish the regression relationship between one or more independent variables and probability (proportion, risk) as the dependent variable. These algorithm are flexible and widely accepted, but requiring specialized software, and an understanding of complex multivariate statistics. StatsToDo presents some code samples in R for those who wish to access these algorithms (see Index Subjects).
This page provides an earlier algorithm to perform simple linear regression between a single ordinal predictor and an outcome that is a proportion. The calculations are based on the Chi Square distribution.
The entry data consists of 3 columns.
The data in the example were artificially created to demonstrate the procedure, and not real. It perports to be from a study of business failures over the years.
The data was compiled, and the probability of failure (proportion, risk, Ppos) calculated. The results are presented as in the table to the right. It can be seen that failure rates were 9.1% for 1990, 13.8% for 1992, and 16.9% for 1995, and the overall failure rate was 14.9%
The program now partitions the Chi Square, as shown in the table to the left. The analysis shows that the Chi Square for regression is significant at the p<0.05 level. Once this is partitioned, the residual Chi Square is not statistically significant. A conclusion can therefore be drawn that, other than an increasing trend, the proportion of business failures were otherwise homogeneous during those years Finally, the regression coefficient is calculated. Change in proportion per unit row value = 0.015, indicates that, between 1990 and 1995, the trend of business failures increased by 1.5% per year. ReferencesSteel R.G.D., Torrie J.H., Dickey D.A. Principles and Procedures of Statistics. A Biomedical Approach. 3rd. Ed. (1997) ISBN 0-07-061028-2 p. 520-521
R Program for regression of proportion is a single conginuous program. To make it easier to follow, the listing is divided into 2 sections
Section 1: Initial data input and matrix of summaries # Section 1: Preparation dat = (" X NPos NNeg 1990 10 100 1992 8 50 1995 61 300 ") df <- read.table(textConnection(dat),header=TRUE) # conversion to data frame df$RowTot <- df$NPos + df$NNeg # total number each row df$Prob <- df$NPos / df$RowTot # probability of Pos each row df # Summary of Input DataThe initial matrix with all the data necessary for calculations are as follows. Please note:
> df # Summary of Input Data X NPos NNeg RowTot Prob 1 1990 10 100 110 0.09090909 2 1992 8 50 58 0.13793103 3 1995 61 300 361 0.16897507Section 2 is the actual calculations # Preparation for calculation rows = nrow(df) posTot = sum(df$NPos) negTot = sum(df$NNeg) tot = sum(df$RowTot) # vevtors for results Source <- vector() ChiSq <- vector() DF <- vector() P<- vector() # calculate total chi sq zw = 0 chiTot = 0 dfTot = rows - 1 for(i in 1:rows) # for each row { zw = zw + df$X[i] * df$RowTot[i] # row value x row count e = df$RowTot[i] * posTot / tot; # expected o = df$NPos[i] # observed number pos chiTot = chiTot + (o - e)**2 / e # add to Chi Sq e = df$RowTot[i] * negTot / tot; # expected o = df$NNeg[i] # observed number beg chiTot = chiTot + (o - e)**2 / e # add to Chi Sq } pTot = 1 - pchisq(chiTot, df=dfTot) Source <- append(Source, "Total") # add to vectors for eventual display ChiSq <- append(ChiSq, chiTot) DF <- append(DF, dfTot) P<- append(P,pTot) #c(chiTot,pTot) # Calculate regression and its chi sq p2 = posTot / tot; # probability of col 1 top = 0; bot = 0; for(i in 1:rows) { top = top + df$X[i] * df$NPos[i] # sum row value x col 1 bot = bot + df$X[i]^2 * df$RowTot[i] # row val sq x row count } #Calculation of regression coefficient top = top - posTot * zw / tot bot = bot - zw^2 / tot reg = top / bot # regression coefficient # calculate chi sq regression chiReg = top^2 / (bot * p2 * (1 - p2)) # chi sq regression pReg = 1 - pchisq(chiReg, df=1) Source <- append(Source, "Regression") # add to vectors for eventual display ChiSq <- append(ChiSq, chiReg) DF <- append(DF, 1) P<- append(P,pReg) # Calculate residual chi sq chiRes = chiTot - chiReg # chi sq residual dfRes = dfTot - 1 pRes = 1 - pchisq(chiRes, df=dfRes) Source <- append(Source, "Residual") # add to vectors for eventual display ChiSq <- append(ChiSq, chiRes) DF <- append(DF, dfRes) P<- append(P,pRes) # output dfRes <- data.frame(Source, ChiSq, DF, P) # combine vectors into data frame for display dfRes # display chi sq, df, and significance in p # Regression coefficient reg #regression coefficient changes in propbabilit per unit of XThe results are as follows > dfRes Source ChiSq DF P 1 Total 4.11131566 2 0.12800860 2 Regression 4.01760141 1 0.04502771 3 Residual 0.09371425 1 0.75950732 > reg #regression coefficient changes in propbability per unit of X [1] 0.014958 |