homeabout uscontact us

 

Tutorial 7: Appendix: Minimum Standard Deviation in IBIS

 

This appendix describes the choice and effect of the Minimum Standard Deviation parameter in IBIS.

 

Minimum Standard Deviation Too Small

In some datasets IBIS will find patterns like the one shown below:

The points for the class colored red nearly fall all on one straight line. If too small a value is chosen for the Minimum Standard Deviation, QDA or UGDA IBIS will create very narrow region covering those points and compute a very high accuracy.

However, the likelihood that such a classifier reflects biological reality is exceedingly small if the width of the class region is smaller than the random variation in gene expression inherent in the system.

Similarly an LDA classifier could compute an unrealistically high accuracy by forming a class boundary between samples which are separated by less than the natural random variation in expression in the genes.

 

Minimum Standard Deviation Too Large

On the other hand significant effects can be obscured by setting the Minimum Standard Deviation too large. Consider the same dataset as depicted above, only this time with a larger Minimum Standard Deviation.

It is reasonable to suppose that the pattern here might be significant (up to the limitations of the number of samples). But as the Minimum Standard Deviation is increased, the region predicted as 'red' gets increasingly broad and eventually circular until the legitimate linear correlation between the two genes for the red class samples is lost. At the same time, the accuracy score for these genes as predictors goes down rapidly, as the broadening of the prediction region takes in more and more blue samples. Therefore setting the Minimum Standard Deviation much larger than the natural variation in the expression values can result in real patterns going undetected.

 

Default Value

GeneLinker computes a suggested Minimum Standard Deviation each time the IBIS Classifier Search dialog box is opened. The suggested or default value is computed from a random sample of the data, and so the number may be different each time. Because the Minimum Standard Deviation only has an effect in rare cases, and because the random variation in the default value is small, it is not usually necessary to change the default value. If you believe you have a case like one of those described above you may wish to use a fixed estimate of the standard deviation for all IBIS runs. You may also wish to try several different values to see what effect they have on the classification accuracy and Mean Squared Error.