homeabout uscontact us

 

Tutorial 4: Conclusion

 

Discussion of the Results:

If you create new SOMs of the same data but with different random seeds, you should find slightly different distributions of samples each time. However, you should also find that there are certain features that do not change. For instance, there are consistently a small cluster of ALL-T samples, two clusters dominated by ALL-B samples, and a cluster of AML samples. The position of each of these clusters in the SOM will change, and certain samples will move from one cluster to another. Note, however, that certain samples do seem to cluster together consistently. For instance, sample AML-66 has a tendency to cluster with ALL-B samples. This indicates that sample AML-66 has a gene expression profile more like those of other ALL-B samples than of other AML samples, under this clustering protocol. This sample might therefore be considered a candidate for further investigation. A good first step would be to repeat the analysis varying other parameters such as the gene filtering method, the normalization, and the type of metric, to determine whether the interesting observation holds.

The analysis steps in this tutorial are captured in the GeneLinker Scripts Tutorial4.gls. This is a workflow script that applies value-removal, Missing Value Estimation, log2 normalization and SOM generation to the selected dataset. One aspect of Missing Value Estimation that is important is that the saved parameter is the fraction of missing values that triggers the removal of a gene, rather than the absolute number. This means that for datasets with different numbers of samples you will see slightly different results.

GeneLinker's scripting capability is described in the GeneLinker script generation and script running documentation.

When you are finished, you can close all the open plots either by clicking on the 'x' box in the upper-right hand corner of each, or by selecting Close All from the Window menu.

 

References:

1. The basic reference on SOMs from the machine-learning perspective is Teuvo Kohonen Self-Organizing Maps, 2nd edn. (Berlin: Springer, 1997). Contains no discussion of application to gene expression data.

2. T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander in 'Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring' [Science 286: 531, 1999] applied 2x1 and 4x1 SOMs to the first 38 samples of the AML/ALL dataset.

3. P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. S. Lander, and T. R. Golub, in 'Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation' [Proc Natl Acad Sci USA 96: 2907-2912, 1999] used a 6x5 SOM on 828 yeast genes.

4. P. Toronen, M. Kolehmainen, G. Wong, and E. Castren, in 'Analysis of gene expression data using self-organizing maps' [FEBS Lett 451: 142-146, 1999] analyzed 6400 yeast genes using a 16x16 SOM on the diauxic shift dataset.

5. A. Hill, C. P. Hunter, B. T. Tsung, G. Tucker-Kellogg, and E. L. Brown, in 'Genomic Analysis of Gene Expression in C. elegans' [Science 290: 809, 2000] used a 6x6 SOM on 4221 genes.

 

Where To Go From Here