
Overview
In GeneLinker™ the term normalization is used to describe scaling, translation, or any other numerical transformation of the data besides filtering. These transformations fall into three broad categories:
You may need to correct for nonbiological variations between different samples. For example, unintentional differences in hybridization procedures or between microarray chip manufacturing batches may cause systematic differences between samples. Normalizations which can help correct these sources of variation include mean scaling, median scaling, linear regression and control gene normalizations.
Twocolor data must be merged into ratios, and dye biases can also be corrected for at the same time.
If you are going on to study the data by clustering, you may need to put different genes on a single scale of variation. Normalizations which may accomplish this include logarithm, standardization, division by maximum and scaling between 0 and 1.
Any number of these normalizations can be applied to dataset in succession. For instance, it may be appropriate to scale samples to correct for nonbiological variations, and then place genes on a common scale before clustering, association mining or supervised learning takes place.
Techniques for Correcting NonBiological Variation Between Samples
Linear Regression: This procedure scales the values relative to a baseline sample so that the bestfit slope of each sample is equivalent. All genes can be fitted, or only a userselected set of 'housekeeping' genes.
Division by Central Tendency (Mean): This procedure scales the expression values so that all samples have a common mean.
Division by Central Tendency (Median): This procedure scales the expression values so that all samples have a common median.
Positive and Negative Control Genes: In some experiments there may be one or more control genes whose values are expected to be constant. With multiple controls, the median or mean is calculated over all of the controls.
Normalization relative to negative controls subtracts the median or mean of the controls within the sample. Negative control genes are understood to be absent or below a detection threshold.
Normalization relative to positive controls divides each sample by the mean or median of the controls. Positive control genes are understood to be present in constant abundance in all samples.
Techniques for Adjusting TwoColor Data
Lowess: The logratio expression values are adjusted by a locallyweighted linear regression on each sample to account for intensitydependent dye bias.
Logarithm: Gene expression values are replaced with the logarithm of their values. Taking the logarithm equalizes the influence of up and downregulated genes in ratio experiments.
Subtraction of Central Tendency: This procedure transforms the expression values such that all samples have zero mean or median.
The Lowess normalization automatically merges the treatment and control channels into adjusted ratios. Any other operation on a twocolor table automatically uses the unadjusted ratios.
Note: Lowess is the only normalization option for incomplete twocolor datasets.
Techniques for Placing Different Genes on a Similar Scale
Logarithm: Gene expression values are replaced with the logarithm of their values. In nonratio experiments, taking the logarithm reduces the influence of highabundance genes in comparison to lowabundance genes.
Divide by Maximum: Gene expression values are scaled such that the largest value for each gene becomes one.
Scaling Between 0 and 1: Gene expression values are scaled such that the smallest value for each gene becomes zero and the largest value becomes one. Also known as MinMax Normalization.
Standardize: Gene expression values are scaled such that each gene has an average of zero and a standard deviation of one.
Related Topics: