|
From visual examination of a hierarchical clustering, Wen et al. identified five groups or 'waves' plus a small number of outliers or 'other' genes. This step will demonstrate that GeneLinker™ can be used to get a similar clustering, using the K-Means clustering function.
The key feature of K-Means clustering is that you choose a priori the number of clusters you think the data should be divided into. This number is the 'K' in K-Means.
The K-Means algorithm uses the same Euclidean Average-Linkage distance metric used for hierarchical clustering earlier.
Perform Partitional Clustering
1. If the renamed normalization item in the Experiments navigator is not already highlighted, click it.
2. Click the Partitional
Clustering toolbar icon , or select Partitional
Clustering from the Clustering
menu, or right-click the item and select Partitional
Clustering from the shortcut menu. The Partitional
Clustering parameters dialog is displayed.
3. Set dialog parameters.
Parameter |
Setting |
Clustering Orientation |
Cluster Genes |
Distance Measurements: Between Data Points |
Euclidean |
Distance Measurements: Between Clusters |
Average Linkage |
Algorithm Properties: Type |
K-Means |
Algorithm Properties: Number of Means |
5 |
Algorithm Properties: Random Seed |
999 |
3. Click OK. The clustering operation is performed and upon successful completion, a new Gene Partitional Clustering experiment is added to the Experiments navigator under the original dataset. Rename it if you like.
If you have automatic visualizations enabled in your user preferences, a matrix tree plot of the clustering results is displayed. You can close this plot when you are finished looking at it.
Use of the Random Seed Parameter
In normal use, setting the random seed is neither necessary nor recommended. In a tutorial you set the random seed to a consistent value so that you will obtain precisely the same results that we depict and discuss, which makes the tutorial easier to understand. When you are not following a tutorial, you should generally not adjust the random seed at all.
The random seed setting may affect irrelevant details, such as the labelling and ordering of clusters. In other cases the random seed may affect relevant details, such as which genes occur together in clusters. Because of this latter possibility, it is sometimes worth repeating an experiment with different random seeds to see what the effects are. (In step 7 see 'The Centroid Plot: Variability in K-Means Clustering' below.) GeneLinker™ helps with this by setting a new random seed every time an operation is carried out, so you don’t need to.
On occasion you may need to determine whether a certain variation in results is due to the random element, or some other cause. For this reason you are able to set the random seed to a fixed value, thus controlling that source of variation.