StatsToDo : Classification by Bayes Probability Explained
Introduction Simple Bayes Basic Bayes Naive Bayes Discussions References
This page provides explanation and support for the two programs in Classification by Basic Bayes Probability Program Page and Classification by Naive Bayes Probability Program Page . As the programs and this explanation page use specific terms and abbreviations, and these are best demonstrated with examples, this introduction panel will describe the example used and the terminology.

The format of data entry and explanation of results produced are in the Help and Hints panel of the program pages.

• The Simple Bayes panel provides the calculations used in the one predictor model, the simplest Bayesean model
• The Basic Bayes panel provides same calculations, but in a two predictors model, the default example in the Classification by Basic Bayes Probability Program Page .
• The Naive Bayes panel provides calculations used in the Naive Bayes model, the default example in the Classification by Naive Bayes Probability Program Page .
• The Discussions panel provides detailed explanations and comparisons of the two models, an extension of this Introduction panel.
• The References panel presents some references that may be useful to the users.
The remainder of this panel provides a description of the example data used in this and the two program pages, and brief explanations for the terms used in these pages.

### The Example

The same example is used in the two programming pages and this explaination page.
PredictorOutcome
HairEyePatternFrenchGermanItalian
Dark (+-)Brown (+--)+-+--113
Dark (+-)Blue (-+-)+--+-312
Dark (+-)Others (--+)+---+111
Light (-+)Brown (+--)-++--212
Light (-+)Blue (-+-)-+-+-211
Light (-+)Others (--+)-+--+151

We wish to develop a Bayesean model to identify the ethnicity of people, based on hair and eye color. To build our model, we recruited 10 each of known French, German, and Italians, and observed their hair and eye color. We then use the Bayesean model to predict ethnicity using hair and eye colors, in a community with an expected ratios of French:German:Italian of 3:2:1. In addition, we introduced a bias in our prediction, as cost ratios of 1:2:1 for French:German:Italian. The count of each combinations are presented in the table to the right, and the explanation of terms and abbreviations used are as follows

### Terminology

The following terms are used to explain and present results of the Bayesean Probability models on StatsToDo. Details of calculation and explanations are presented when each model is discussed in the following panels.

The Outcome (o) is what we want to predict, and in this example consisting 3 mutually exclusive ethnicities, French, German, and Italian

The predictors are what we use to predict the outcome, in our example hair and eye color

• Each predictor has two or more mutually exclusive attributes (a), in our example dark and light for hair color, brown, blue, and others for eye color. If one of the attribute in a predictor is positive (+), all others are negative (-). StatsToDo uses combinations of +s and -s to represent each predictor. In hair color, +- for dark, and -+ for light. In eye color, +-- for brown, -+- for blue, --+ for others
• When there are more than one predictor, the combination of attributes forms a pattern (p). The number of possible patterns are the product of the number of attributes in all predictors. StatsToDo uses combinations of +s and -s to represent each predictor, and concatenates these symbolds to represent a pattern. In our example, there are 2x3=6 patterns of hair and eye color: dark_brown (+-+--), dark_blue (+--+-), dark_others (+---+), light_brown (-++--), light-blue (-+-+-), light_others (-+--+)
• Please note: that the use of a text string of +s and -s to represent predictors is not an universal practice, but developed specifically for StatsToDo, a compromise between the needs of brevity, clarity, and transferring large table of data to and from web pages (copy and paste)

Prediction. In the Bayesean sense, prediction is not foretelling the future, nor a method of discovering what may be true. It is the mathematical manpulation of probabilities, before and after the application of predictors. In our example, prediction is to estimate the probability of the person being French, German, or Italian, based on observing his/her hair and eye color

Probability, is a number between 0 (0%, no confidence) and 1 (100%, certainty). Although computationally it is treated the same as probability in other domains, in the Bayesean sense probability represents how confident we are with our conclusions. The following probabilities and their abbreviations are used in StatsToDo.

• Probabilities calculated during model development
• The probability of an outcome is P(o), of being positive in an attribute P(+) or P(a), and of a pattern of positives and negatives P(p)
• Bayesean model uses conditional probability, the probability of one thing in the presence of (given) another. This is abbreviated as P(y|x) Probability of y given x.
• Probability of being positive in an attribute given an outcome, P(+|o) or P(a|o) is estimated using reference data during model development. A collection of P(+|o)s forms the coefficients of the Baysean model when there is one predictor
• Probability of a pattern, combination of positives and negatives in a list of attributes, given an outcome, P(p|o) is estimated using reference data during model development. A collection of P(p|o)s forms the coefficients of the Bayesean model when there are multiple predictors
• Probability of an outcome, given a positve attribute P(o|+) or a pattern P(o|p), also called Maximum Likelihood, describes the model. This is calculated from the coefficients of the model developed, without any other considerations
• Probabilities when using the model
• The a priori probability π is the probability of outcomes before we apply the Bayesean model. In our example, we expect the background ratio of French:German:Italian to be 3:2:1, transformed (by dividung each by the total) to a priori probabilities of 0.5:0.333:0.167
• The a posteriori probability P(o|+,π), P(o|p,π) is the probability we predict, altering the a priori probability using our Bayesean model. This is done with the attribute that is positive when using a single predictor (P(o|+,π)), and with a pattern of attributes when using multiple predictors (P(o|p,π)). The a posteriori probability is the Bayesean Probability, what this page is all about.
• The a posteriori probability may include a bias, using cost coefficients c, P(o|+,π,c), P(o|p,π,c). In this example, we assigned the relative cost (importance) to French:German:Italian of 1:2:1, transformed (by dividung each by the total) to cost coefficients of 0.25:0.5:0.25