|
Platinum
In this step, new datasets, containing only the expression values for the genes in the gene list, are created from the training and test datasets by the process of gene list filtering. This step ensures that the dataset used to train the ANN classifier contains the same genes as the test dataset.
Note: gene list filtering does not change the order of genes in a dataset, and for classifying with an ANN classifier, the test dataset must contain not only the same genes as the training dataset, but they must also be in the same order and without any extra genes.
Filter Original Datasets Using the Gene List
Follow the procedure for the 'Khan_training_data' dataset and then repeat it for 'Khan_test_data'.
1. Click the 'Khan_training_data' ('Khan_test_data' for the second filter) item in the Experiments navigator. The item is highlighted.
2. Click the Filter
toolbar icon, or select Filter
Genes from the Data menu,
or right-click the item and select Filter
Genes from the shortcut menu. The Filter
Genes parameters dialog is displayed.
3. Set dialog parameters.
Parameter |
Setting |
Filtering Operation |
Gene List Filtering |
Filtering Operation Type |
Keep only genes that are in this list |
Gene List |
Tutorial 6 List |
4. Click OK. The gene list filtering operation is performed, and a new item (Filter Genes) is added under the 'Khan_training_data' ('Khan_test_data') item in the Experiments navigator.
Since the classifier that is to be created must have the same inputs (genes) to work on when it makes predictions as it does when it is trained, the training and test datasets are filtered the same way. If this is not done, the classifier may produce nonsensical predictions. It is not strictly necessary to filter both the training and test data at the same time. You could filter the test data after you have created a classifier, but before running the classifier on the test data.