![]() | Content Disclaimer Copyright @2020. All Rights Reserved. |
Links : Home Index (Subjects) Contact StatsToDo |
Explanations
Javascript Program
R Codes
D
E
F
A common statistical problem is to describe a relationship between two measurements that is not linearly related
(in a straight line).
When such a relationship can be mathematically defined, such as one is the square or square root of the other, the variables can be transformed mathematically before a relationship established by linear regression (programs for transformation and linear regression can be found via the subject index Often however, a curved relationship that exists may appear regular and consistent, but a mathematical definition of that relationship is not available, and an empitical "best fit" algorithm is required. A common method of doing this is the polynomial cuve fit, with the formula
Polynomial curve fitting can be carried out using multiple regression, where the x variable is expanded by its powers. This is a common procedure in the biochemical laboratory, where the depth of a color reaction is related to the concentration of a chemical of interest When polynomial curve fitting is used to created reference charts such as growth charts for weights and heights of children, more than the mean regression curve line is required. There is a need to also knowing the Standard Deviation (SD), so that deviation from that central line can be meaningfully interpreted. Using the traditional analysis of variance to estimate SD is problemattical for polynomial curve fit. Firstly, the variances from the coefficients of different power need to be integrated, and the complexity involved may themselves generate random variations. Secondly, in many growth type situations, the SD changes with size, and relying on the assumptions of the analysis of variance model may produce misleading interpretations. Altman (see reference) described a two stage procedure that side stepped these problem. Instead of estimating SD from a theoretical construct, the algorithm firstly curve fit the mean regression line, then a second curve fit to estimate the SD around that mean line. In other words, the curve fit of the mean curve line and its Standard Deviation are both modelled from the data itself. The algorithm is clearly described in Altman's paper, and also in the javascript and R programs presented on this page. The example from this page may help to follow these procedures. ExamplePlease note: that the data in this example is artificially generated to demonstrate the procedures and not real. Also the sample size is deliberately small to make visualization easier. In any growth chart, hundreds if not thousands of cases are required.
The table to the right contains input data and calculated results (to 2 decimal point precision for clarity) Example: We wish to develop a growth chart for birth weights of babies (in grams), according to the gestational age (in weeks). For this exercise, we used the data from 22 new born babies, the measurements are as presented in the columns 1 and 2 of the table to the right. The independent variable, X, is gestational age in weeks since the beginning of pregnancy. The dependent variable, Y, is birth weight in grams. To set the parameters for calculations, we understand that the relationship between gestation and weight may be a complex curve, and that variation may also changed with gestation. After some trials and errors, we decided to curve fit Y (grams) against X (weeks) to the power of 3, and the Standard Deviation around that curve to the power of 2. We then described the 95% confidence of the fitted line. Step 1: Curve fit X and Y: We perform multiple regression curve fit to the power of 3 between X (gestational age in weeks) and Y (birth weight in grams), and produced the following euation
Step 2: Curve fit X and Standard Deviation (SD) around Fitted Y: The absolute difference between each Y and fitted Y values are used to curve fit agains X to the power of 2. The result coefficients are then multiploed by π/2 to produce the coefficients for Standard Deviations around the fitted Y. The result coefficients are
The SD and the two limits of the 95% confidence interval are then calculated and presented as columns 4, 5, and 6 of the table above and to the right. Step 3: Evaluate any x, y pair using the curve fitted coefficients: The following procedures can be used to correct any outcome y value by its paired x value, using the two sets of curve fit coefficients. The values from the first row of the table above and to the right is used in the following demonstration
Step 4: A Javascript Code to calculate Y from X: The whole point of performing a curve fir is to produce coefficients from a set of reference data, and use these to calculate y values from x values in the future. Although the calculations are conceptually simple, they are nevertheless tedious, especially if they have to be carried out repeatedly. The program on this page therefore produces a short Javascript program which will do these calculations, including the coefficients calculated from the input data, to produce Y and its 95% confidence interval from any X value.
User can copy and paste the codes into any text editor, and save the file as a html page. The program can then be used on any web browser. Some users may further elaborate on the codes to improve its presentation, and to adapt it to use in bulk calculations. Step 5: Plotting the data and the results: The data points, and the 3 curve fitted lines (mean and the confidence limits) can be plotted in a scatter plot, as shown to the right ReferencesAltman DG (1993) Constructing age-related reference centiles using absolute residuals. Statistics in Medicine 12(10):917-924https://www.medcalc.org/manual/age-related-reference-interval.php Medical online cqlculator using this method
To Harvest the Bitmap
Curvefit (ROC)Altman's algorithm
Ref: Altman DG (1993) Constructing age-related reference centiles using absolute residuals. Statistics in Medicine 12(10):917-924 The following is a single program, but divided into parts so it is easier to follow Part 1: Subroutine to calculate cueve fitThe function calculates a vector of Y values from a vector of curve fit coefficients and a vector of x values, where
# subroutine to calculare vector of Y from vector of X using coefficient vector CalCurveFitValues <- function(coefVec, datVec) { f = length(coefVec) # length of coefficient vector n = length(datVec) # length of data vector (X) vecResult <- vector() # result vector (Y) for ( i in 1:n) { x = datVec[i] y = coefVec[1] + coefVec[2] * x for (j in seq(3,f, by=1)){ y = y + coefVec[j] * x^(j-1) } vecResult[i] = y } vecResult } Part 2: Main program starts: Data entry# data entry myDat = (" X Y 37 3048 36 2813 41 3622 36 2706 35 2581 39 3442 40 3453 37 3172 35 2386 39 3555 37 3029 37 3185 36 2670 38 3314 41 3596 38 3312 38 3414 41 3667 40 3643 33 1398 38 3135 39 3366 ") # Set power of polynomial curve fitting pwLine = 3 # power of fitting the line # power of curve fit for line pwSD = 2 # power of fitting the SD # power of curve fit for Standard Deviation cfInt = 95 # % confidence interval # % of confidence for confidence intervals # Calculate z for 2 tails z = qnorm((100 - (100 - cfInt) / 2) / 100) # converting % confidence into z for calculating confidence intervals # create dataframe from input data dfis the dataframe df <- read.table(textConnection(myDat),header=TRUE) #df # optional display of input data Part 3: Curve fit the main line# Curve fit the line resLine<-lm(formula = df$Y ~ poly(df$X, pwLine, raw=TRUE)) # summary(resLine) # optional R display of curve fit results # Extract Coefficients coefLine <- coef(summary(resLine))[1:(pwLine+1)] coefLine # output coefficients for fitted Y valueThe vector is displayed as follows. The first item is the constant, the rest the coefficients for x1, x2, x3 > coefLine # output coefficients for fitted Y value [1] -144612.43666 10177.18075 -233.18266 1.78399The regressed values (RegY) for each x value of the input data is then calculated and add to the data frame # calculate regressed Y value df$RegY <- CalCurveFitValues(coefLine,df$X) #df optional testing showing the fitted Y value Part 3: Curve fit the Srandard DeviationThe absolute difference between the Y value and fitted Y value (RegY) is used as the dependent variable, and curve fitted against X. The result formula is then multiplied by π/2 to result in the coefficients for the Standard Deviation# Curve fit SD vecAbsDif <- abs(df$RegY - df$Y) # absolute difference between fitted Y and Y #vecAbsDif # optional test for difference resSD<-lm(formula = vecAbsDif ~ poly(df$X, pwSD, raw=TRUE)) # curve fit abs(difference) to X #summary(resVar) # optional test show interim results # Extract Coefficients coefSD <- coef(summary(resSD))[1:(pwSD+1)] coefSD <- coefSD * sqrt(pi / 2) # coefficients for SD at x level coefSD # output coefficients for calculate SD depending on xThe coefficients for calculating SD of fitted Y according to x values are shown. The first item is the constant, the rest the coefficients for x1, x2 > coefSD # output coefficients for calculate SD depending on x. [1] -5295.423424 290.683788 -3.912461 Part 4: Interpret input data using the fitted coefficients#Calculate SD z and percentile of data for each input pair of x and y df$SD <- CalCurveFitValues(coefSD,df$X) # SD for x df$CILow <- df$RegY - z * df$SD # confidence interval for fitted Y (low) df$CIHigh <- df$RegY + z * df$SD # confidence interval for fitted Y (high) df$z <- (df$Y - df$RegY) / df$SD # (Y - fitted Y) / SD df$Pctile <- round(pnorm(df$z) *100, 1) # z converted to percentile df # show input data and all calculated valuesThe results are the appended table, as shown
> df # show input data and all calculated values X Y RegY SD CILow CIHigh z Pctile 1 37 3048 3080.636 103.71724 2877.354 3283.918 -0.31466118 37.7 2 36 2813 2795.181 98.64313 2601.844 2988.518 0.18063973 57.2 3 41 3622 3626.298 45.76447 3536.602 3715.995 -0.09392555 46.3 4 36 2706 2795.181 98.64313 2601.844 2988.518 -0.90407850 18.3 5 35 2581 2428.703 85.74409 2260.648 2596.758 1.77618073 96.2 6 39 3442 3451.290 90.39070 3274.128 3628.453 -0.10278087 45.9 7 40 3453 3557.898 71.99005 3416.800 3698.996 -1.45712224 7.3 8 37 3172 3080.636 103.71724 2877.354 3283.918 0.88089702 81.1 9 35 2386 2428.703 85.74409 2260.648 2596.758 -0.49802844 30.9 10 39 3555 3451.290 90.39070 3274.128 3628.453 1.14734769 87.4 11 37 3029 3080.636 103.71724 2877.354 3283.918 -0.49785155 30.9 12 37 3185 3080.636 103.71724 2877.354 3283.918 1.00623780 84.3 13 36 2670 2795.181 98.64313 2601.844 2988.518 -1.26903043 10.2 14 38 3314 3295.771 100.96643 3097.880 3493.661 0.18054603 57.2 15 41 3596 3626.298 45.76447 3536.602 3715.995 -0.66205182 25.4 16 38 3312 3295.771 100.96643 3097.880 3493.661 0.16073747 56.4 17 38 3414 3295.771 100.96643 3097.880 3493.661 1.17097419 87.9 18 41 3667 3626.298 45.76447 3536.602 3715.995 0.88936992 81.3 19 40 3643 3557.898 71.99005 3416.800 3698.996 1.18213138 88.1 20 33 1398 1409.861 36.47125 1338.378 1481.343 -0.32520229 37.3 21 38 3135 3295.771 100.96643 3097.880 3493.661 -1.59232038 5.6 22 39 3366 3451.290 90.39070 3274.128 3628.453 -0.94357530 17.3 Plotting the resultsThe data points and the regression line _ confidence intervals are plotted# Plotting # Create vectors of x and y for calculation of line coordinates # divided into 50 intervals #Produce table minv = min(df$X) maxv = max(df$X) arX <- seq(from = minv, to = maxv, length.out = 50) # 50 x values from min to max in sequence arY <- CalCurveFitValues(coefLine, arX) # Fitted y values to x arSD <- CalCurveFitValues(coefSD, arX) # Standard Deviations according to x arYLow <- arY - z * arSD # array of y for lower end of confidence interval arYHi <- arY + z * arSD # array of y for higher end of confidence interval # plotting begins par(pin=c(4.2, 3)) # set plotting window to 4.2x3 inches plot(x = df$X, # x y = df$Y, # y pch = 16, # size of dot xlab = "X", # x label ylab = "Y") # y lable lines(arX,arY) # line for fitted y RegY lines(arX,arYLow) # line for lower end of confidence interval lines(arX,arYHi) # line for higher end of confidence interval
Contents of D:3
Contents of E:4
Contents of F:5
|