|
Platinum
Display a Classification Plot
1. If the Predictions item (or whatever you named it) in the Experiments navigator is not already highlighted, click it.
2. Select Classification Plot from the Predict menu, or right-click the item and select Classification Plot from the shortcut menu. The Classification Plot is displayed showing the predicted classes, the raw votes of the component classifiers and other information.
3. From the Comparison Variable drop-down list box in the upper right corner, select test classes. Some of the rectangles in the view turn red, signifying misclassifications.
Interpretation
This is a very rich display, and it may take some experience before you are able to interpret it easily.
Each row represents a sample. On the left of each row is a Sample name and Prediction or predicted class. The rest of the display consists of boxes representing the outputs of the artificial neural networks for each of the possible classes for that sample.
Each column represents a class. The colors of the boxes are significant:
A box highlighted in dark green is the predicted class for that sample.
A box highlighted in red is the true class of that sample if one is known. (See the discussion in Step 10 about observations of ‘Unknown’.) The class of a sample that has a dark green box and a red box has been predicted incorrectly. If the classifier predicts the sample class correctly, or if the correct value is not known, only a dark green box appears.
A box that is colored gray represents neither the predicted class nor the true class.
If GeneLinker™ refuses to make a prediction for a sample, it will have 'Unknown' listed under prediction and no dark green box.
If the sample's true class is 'Unknown', it will not have a red box. (This will not happen when viewing training data since true classes must be known for all training samples.)
Hence the number of red boxes in the display indicates the number of misclassifications. Reducing the rate of misclassifications is discussed below.
Component Classifier Votes
Inside each box is a representation of the votes of each of the neural networks in the committee. Each of 10 neural networks was trained on a different 90% of the training data. Each of the horizontal rectangles in the view above represents the output of all 10 neural networks for a given class on a given sample. If all 10 neural networks are in agreement (i.e. have the same output value) then there will be a solid bar - at the right end if they all have high output (i.e. that is the sample's class), at the left end if they all have low output (i.e. that is not the sample's class).
Class Prediction Process
The class prediction (or call) is done by a simple vote. For a given sample, each neural network votes for the class with the highest output. If 2/3 (default setting) of the networks agree on a single class, we call that a prediction. In any other case, no prediction is made and the sample is labelled 'Unknown'.
Example:
Look at TEST-10 in the image above. Because 2/3 of the neural networks could not agree on which class it was, 'Unknown' was entered as the prediction. However, there is more information about TEST-10 in the display than just its misclassification.
Look at the outputs for class BL: the box in the second column. There is a solid gray bar at the left end of the histogram - this indicates that the ANN outputs for that class were uniformly zero. None of the neural networks gave any weight to classifying the sample as BL. Under class EWS, the results were almost the same: one or two ANNs gave a result only marginally greater than zero. In other words, the ANNs were unanimous that the sample did not fall into the BL or EWS classes.
The ANN outputs for the other two classes are mixed - some ANNs voted for NB and some for RMS. In the context of the input genes, we conclude that the sample more nearly resembles RMS and NB than it does EWS or BL. In other words, the sample lies somewhere near the decision boundary between classes RMS and NB.
As the red box indicates, the true class for this sample is RMS. Perhaps if we have set the voting threshold lower - around 50% - then the classifier would have made a prediction of RMS for this sample.
The other sample which was not given a prediction (or predicted to be ‘Unknown’, if you wish) was TEST-11. Interestingly, TEST-11 was one of the five test samples which did not fall into the original four training classes. TEST-11 was a non-SRBCT cancer sample.
Reasons For Misclassifications:
There are often no misclassifications in the training data – artificial neural networks are fairly powerful and adaptable learners. If there are misclassifications, however, it may be for one of several possible reasons:
We may be using a set of genes which do not discriminate between the sample classes.
The training set may be unbalanced. That is, it may have too many examples of one class and not enough of another.
We may have set the number of hidden units in the neural networks too small.
The data may contain errors such as mislabelled samples or incorrect measurements.
The voting threshold may be set too low.
The stopping criteria may have been set too loose (maximum iterations too small).
The above reasons may affect either training or test results. If the training results are excellent but the test results are poor, it may be for one of the following additional reasons:
The test data may be drawn from a significantly different population than the training data (such as the non-SRBCTs in the example above).
The test data may not have been normalized in a similar fashion to the training data.
The test dataset may have been filtered with different genes than the training dataset. (GeneLinker™ checks only that the number of genes used in training and prediction is the same, not their identities).
We may have set the number of hidden units in the neural networks too large.
We may have too many features (genes) for the number of samples in the training set.
The stopping criteria may have been set too tight (maximum iterations too large).
These last three conditions correspond to a condition called ‘overtraining’. You can think of this as analogous to a child learning a certain set of examples by rote, but failing to be able to generalize from the examples to new cases. When a neural network is either given too much memory for detail (too many hidden nodes or input nodes) or is forced to learn the input examples too well (stopping criteria too tight), then it may simply ‘memorize’ the training data to the detriment of generalizing well on test data.