 Content DisclaimerCopyright @2014.All Rights Reserved.
StatsToDo : CUSUM for Negative Binomial Distributed Counts Explained and R Code

Introduction Example R Code Example Explained
This page provides explanations and example R codes for CUSUM quality control charts, for detecting changes in counts that conforms to the Negative Binomial Distribution.

### CUSUM Generally

CUSUM is a set of statistical procedures used in quality control. CUSUM stands for Cumulative Sum of Deviations.

In any ongoing process, be it manufacture or delivery of services and products, once the process is established and running, the outcome should be stable and within defined limits near a benchmark. The situation is said to be In Control

When things go wrong, the outcomes depart from the defined benchmark. The situation is then said to be Out of Control

In some cases, things go catastrophically wrong, and the outcomes departure from the benchmark in a dramatic and obvious manner, so that investigation and remedy follows. For example, the gear in an engine may fracture, causing the machine to seize. An example in health care is the employment of an unqualified fraud as a surgeon, followed by sudden and massive increase in mortality and morbidity.

The detection of catastrophic departure from the benchmark is usually by the Shewhart Chart, not covered on this site. Usually, some statistically improbable outcome, such as two consecutive measurements outside 3 Standard Deviations, or 3 consecutive measurements outside 2 Standard Deviations, is used to trigger an alarm that all is not well.

In many instances however, the departures from outcome benchmark are gradual and small in scale, and these are difficult to detect. Examples of this are changes in size and shape of products caused by progressive wearing out of machinery parts, reduced success rates over time when experienced staff are gradually replaced by novices in a work team, increases in client complaints to a service department following a loss of adequate supervision.

CUSUM is a statistical process of sampling outcome, and summing departures from benchmarks. When the situation is in control, the departures caused by random variations cancel each other numerically. In the out of control situation, departures from benchmark tend to be unidirectional, so that the sum of departures accumulates until it becomes statistically identifiable.

The mathematical process for CUSUM is in 2 parts. The common part is the summation of depertures from the bench mark (CUSUM), and graphically demonstrating it. The unique part is the calculation of the decision interval abbreviated as DI or h, and the reference value, abbreviated as k, which continuously adjustes the CUSUM and its variance. The two values of h and k depend on the following parameters

• The in control values
• The out of control values
• The Type I Error or false positive rate, expressed as the Average Run Length, abbreviated as ARL, the number of samples expected for a false positve decision when the situation is in control. ARL is the inverse of false positive rate. A false positive rate of 1% would have ARL=100

### Proportions

Proportions can be handled under 3 common types of distribution
• The Binomial Distribution where the measurement is the number of the positive cases in a group of set sample size. The advantage of such an appropach is that the results tend to be stable, as short term variations are evened out with many cases. The disadvantage is that evaluation can only take place when the planned sample size per group has been reached, so conclusions tend to take a long time.
• The Negative Binomial Distribution Where the measurement is the number of negative cases between a set number of positive cases. Evaluation can take place after each time the set number of positive case is reached, so conclusions can be reached sooner. However the results tend to be more variable as it is influenced by short term variations.
• The Bernoulli Distribution where the measurement is either positive or negative for each case. Evaluation therefore takes place after each observation, so conclusions can be reached very quickly, but the results tend to be more chaotic as it varies with each observation.
• This page describes the Binomial Distribution. CUSUM for Binomial Distribution is discussed in the CUSUM for Binomial Distributed Proportion Explained Page, and that for Bernoulli Distribution in CUSUM for Bernoulli Distributed Proportion Explained Page

### CUSUM for Counts based on the Negative Binomial Distribution

The Negative Binomial Distribution is based on the number of outcome negative cases between a set number of positive cases, and each sampling is completed when the defined number of positive cases have been reached. An example is the Caesarean Section rate in many obstetric units, say 20%, which is 4 normal deliveeries between each Caesarean Section, 8 between 2 Caesarean Section, 12 between 3 Caesarean Sections. The number of positive cases (e.g. Caesarean Section) is constant, and the number of negative cases (e.g. normal delivery) is the measurement.

Negative Binomial Distribution is an alternative to the Binomial distribution for CUSUM of proportions. It is sometimes preferred because each sample is quicker, and the results can be obtained when the defined number of positive cases is reached, rather than waiting for results from all the cases in a defined sample size to be completed. Negative Binomial Distribution can also be an alternative to Poisson distribution for CUSUM on counts, particularly if the assumptions of Poisson (variance=mean) cannot be met.

The parameters required are

• The number of positive cases in each sample. This remains constant throughout a CUSUM project
• The expected number of negative cases in each sample before the set number of positive cases is reached.
• The Average Run Length (ARL). This depends on a balance between the importance of detecting deviation against the cost of disruption in case of a false positive. The ARL in Binomial Distribution is based on the number of groups and not on number of cases. Please note: that the algorithm on this page is intended for a one tail monitoring, either an increase or a decrease in the value. If the user intends a two tail monitoring, to detect either increase of decrease, the two CUSUM charts should be created, each with half the ARL that of a one tail CUSUM.

Details of how the analysis is done and the results are describer in the panel R Code Explained. Conceptually, th algorithm is as follows

• The statistics is based on the odds ratio. If r=number of positive cases, and c=number of negative cases:
• mean (mu, μ) = r / c
• variance (v) = μ(1+1/c)
• μin control, vin control, μout of control and ARL are used to obtain the reference value (k) and decision interval (h), both expressed as odds
• The negative outcome count (n) obtained during monitoring is converted into odds, odd = r / n, which is then used to calculate CUSUM
• The CUSUM chart is therefore one of cumulative changes in odds. If the negative counts increases, the odds decreases. If the counts decreases, the odds increases.

### References

CUSUM : Hawkins DM, Olwell DH (1997) Cumulative sum charts and charting for quality improvement. Springer-Verlag New York. ISBN 0-387-98365-1 p 47-74, 147-148

Hawkins DM (1992) Evaluation of average run lengths of cumulative sum charts for an arbitrary data distribution. Journal Communications in Statistics - Simulation and Computation Volume 21, - Issue 4 Pages 1001-1020