 Content DisclaimerCopyright @2020.All Rights Reserved.
StatsToDo : CUSUM for Inverse Gaussian Distributed Measurements Explained and R Code

Introduction Example R Code Example Explained
This page provides explanations and example R codes for CUSUM quality control charts, for detecting changes in measurements that conforms to the Inverse Gaussian Distribution.

### CUSUM Generally

CUSUM is a set of statistical procedures used in quality control. CUSUM stands for Cumulative Sum of Deviations.

In any ongoing process, be it manufacture or delivery of services and products, once the process is established and running, the outcome should be stable and within defined limits near a benchmark. The situation is said to be In Control

When things go wrong, the outcomes depart from the defined benchmark. The situation is then said to be Out of Control

In some cases, things go catastrophically wrong, and the outcomes departure from the benchmark in a dramatic and obvious manner, so that investigation and remedy follows. For example, the gear in an engine may fracture, causing the machine to seize. An example in health care is the employment of an unqualified fraud as a surgeon, followed by sudden and massive increase in mortality and morbidity.

The detection of catastrophic departure from the benchmark is usually by the Shewhart Chart, not covered on this site. Usually, some statistically improbable outcome, such as two consecutive measurements outside 3 Standard Deviations, or 3 consecutive measurements outside 2 Standard Deviations, is used to trigger an alarm that all is not well.

In many instances however, the departures from outcome benchmark are gradual and small in scale, and these are difficult to detect. Examples of this are changes in size and shape of products caused by progressive wearing out of machinery parts, reduced success rates over time when experienced staff are gradually replaced by novices in a work team, increases in client complaints to a service department following a loss of adequate supervision.

CUSUM is a statistical process of sampling outcome, and summing departures from benchmarks. When the situation is in control, the departures caused by random variations cancel each other numerically. In the out of control situation, departures from benchmark tend to be unidirectional, so that the sum of departures accumulates until it becomes statistically identifiable.

The mathematical process for CUSUM is in 2 parts. The common part is the summation of depertures from the bench mark (CUSUM), and graphically demonstrating it. The unique part is the calculation of the decision interval abbreviated as DI or h, and the reference value, abbreviated as k, which continuously adjustes the CUSUM and its variance. The two values of h and k depend on the following parameters

• The in control values
• The out of control values
• The Type I Error or false positive rate, expressed as the Average Run Length, abbreviated as ARL, the number of samples expected for a false positve decision when the situation is in control. ARL is the inverse of false positive rate. A false positive rate of 1% would have ARL=100

### Data with skewed distribution

All unidirectional measurements are ratios against a known standard, and have positve values. In most cases, the range of measurement is sufficiently far from 0, and the variability sufficiently narrow that the errors incurred by assuming the normal distribution is trivial, and statistical analysis can be carried out with that assumption. When the values of measurements are close to zero (0), and the variations wide, the assumtion of normal distribution becomes invalid, as the probability curve becomes skewed with the mode near the lower values and a long tail for the higher values.

An example of this is in time measurements. When we consider the age of pregnant women, the range is between 20-40 years, with the mode near the mean. The skew is so minor that in most cases age of the mother can be treated as normally distributed values.

However, if we consider the duration of labour, the range is from within a few minutes to over 24 hours, with most women delivered between 4-8 hours. The mode is therefore near the lower values, and there is a long tail of higher values. Statistical analysis based on the assumtion of normality will likely lead to misinterpretations.

### Dealing with skewed data

A set of skewed data can be transformed, so that its distribution become closer to normal distribution. The common transformation is log and square root transformation, as these stretches the intervals amongst the lower values and shrink the intervals of the higher values. For more complex skews, the Box Cox transformation can be used. The data are transformed, the analysis assumes normal distribution, and the results reverse transformed to the original units. Data transformation are discussed in the Numerical Transformation Explained Page and programs for them in Numerical Transformation Program Page

### CUSUM for the Inverse Gaussian Distribution

When the data is a continuous measurement of all positive values, and has a skew, the Inverse Gaussian distribution is the most appropriate model to use for CUSUM, as the model allows a precise prediction of what the data should be.

Details of how the analysis is done and the results are describer in the panel R Code Explained. Conceptually, the algorithm is as follows

• Before monitoring begins, a set of reference data must be obtained so that the central location (mu μ) and the skew (lambda λ) can be estimated. These become the in control mu and lambda.
• The out of control location (mu) is then nominated.
• With the in and out of control parameters, and the average run length, the reference value (k) and decision interval h can be computed.
• CUSUM can then be calculated and plotted using monitoring data.

### References

CUSUM : Hawkins DM, Olwell DH (1997) Cumulative sum charts and charting for quality improvement. Springer-Verlag New York. ISBN 0-387-98365-1 p 47-74, 147-148

Hawkins DM (1992) Evaluation of average run lengths of cumulative sum charts for an arbitrary data distribution. Journal Communications in Statistics - Simulation and Computation Volume 21, - Issue 4 Pages 1001-1020