Platinum

Tutorial 6: Step 8 Create an ANN Classifier

ANN Classifier Structure

GeneLinker™'s Artificial Neural Networks consist of three layers of nodes or neurons.

The input layer is connected to the output layer via a hidden, or internal, layer. The input layer has a single node per gene, so if you have eight genes that you want to train the ANNs on, GeneLinker™ automatically builds networks with eight input nodes. The output layer has a single node per class, so if the data have four classes, GeneLinker™ automatically builds a network with four output nodes. The number of nodes in the hidden layer should be greater than or equal to the number of nodes in the input layer, and fewer than twice the number of nodes in the input layer. Too many nodes in the hidden layer results in poor training performance, and too few results in poor classification performance.

Because individual ANNs can sometimes perform poorly on certain inputs, having a committee architecture improves the reliability of classification. Typically 10 is a reasonable number of committee members, with the requirement that 80% of committee members agree for a classification to be made. For a complete description of all of the parameters for creating an ANN committee classifier, please see Creating an ANN Classifier.

Create an ANN Classifier

1. Click the Filtered:keep {Tutorial 6 list} item under the Khan_training_data item in the Experiments navigator. The item is highlighted.

2. Click the Create Classifier toolbar icon , or select Create Classifier from the Predict menu, or right-click the item and select Create Classifier from the shortcut menu. The Create Classifier parameters dialog is displayed.

3. Set dialog parameters.

Parameter	Setting
Representative Variable	training classes
Training Parameters: Hidden Units	5
Miscellaneous: Random Seed	999 (See Note below)

4. Accept the default values for the all other parameters and click OK. The Create Classifier operation is performed, and a new item (ANN: training classes | 8-5-4 | N=10 | 0.0010 | 10) is added under the Khan_training_data Filtered: keep {Tutorial 6 list} item in the Experiments navigator.

If you have automatic visualizations enabled in your user preferences, the Classification plot showing training results is displayed.

Training Parameters

The number of classifiers (10) is arbitrary. The number of hidden units (5) is more significant. Using more hidden units than there are input classes (i.e. 4 in this example) is a little risky but not wrong. In this case the number of hidden units is the number of classes we're really dealing with: 4 SRBCTs plus 1 class for the non-SRBCT samples in the test dataset.

Note: For reasons discussed in 'Tutorial 6: Step 5 Run SLAM', setting the random seed is neither necessary nor recommended in normal use. In the Create Classifier function, the random seed determines how the samples are divided up into subsets for training the component learners (committee members). It also determines how the individual learners (neural nets) are initialized. The random seed generally only affects predictions for borderline or ambiguous samples, which the committee also helps diagnose.

For a discussion of the other parameters in this dialog, see Create Classifier.

It is possible to view the results of the classifier training at this point (see Classifier Plot Training Results), but it is even more informative to go on and test the classifier using data it has not already seen.