homeabout uscontact us


Tutorial 6: Step 3 Discretize the Data


The first step in our analysis of this dataset is to use SLAMô to look for associations between multiple genes and the tumor type.

SLAMô finds associations between genes based on identical patterns of gene expression. For example, if Gene A is HIGH whenever Gene B is LOW, SLAMô identifies an association between Gene A and Gene B. Because the number of possible patterns is enormous, particularly when looking for patterns between five or ten genes rather than just two, we need a fast, simple means of comparing expression levels. By discretizing the data, it becomes possible to compare expression levels in terms of a small number of discrete categories (e.g. HIGH/MEDIUM/LOW) rather than continuous values. This speeds up the comparison process by many orders of magnitude.


Discretize the Data

1. Click the Khan_training_data dataset in the Experiments navigator. The item is highlighted.

2. Click the Discretize toolbar icon , or select Discretize Data from the Predict menu, or right-click the item and select Discretize Data from the shortcut menu. The Discretization parameters dialog is displayed.

Operation Type

Number of Bins

Choosing the number of bins is a balancing act. The more bins you use, the less information is discarded by the discretization. But the more bins there are, the fewer associations SLAMô will find.

Accept the default parameters (Quantile discretization, Per Gene, and 3 bins).

3. Click OK. The Experiment Progress dialog is displayed. It is dynamically updated as the discretization operation is performed.

Upon successful completion, a new Discretized: 3 bins/gene | quantile dataset is added under Khan_training_data in the Experiments navigator.