Importing Data from Affymetrix MAS 5.0 Files

Overview

The data files must be in two-column format, with m/Z in the first column and spectral value in the second column. If there are more than two columns the other columns will be ignored. Columns must be separated by whitespace or a comma.

The files may have a header consisting of lines beginning with a "#" character. Data will be read from the fist line that does not begin with a "#" character to the end of the file or until another line is found that begins with a "#" character. If you have problems with data import please inspect your data files as there are many corrupt files out there.

There are three scripts available for spectral or peak data:

Peak Data

Use this script on files that contain peak intensities that have been extracted by other software. This script will use various heuristics to reconcile peaks from different spectra and make intelligent judgements as to when to peaks with slightly different floating-point mean values are in fact "the same."

Spectral (proteomics) Data

Use this script to import spectral data rebinning to 20,000 channels with equal-width bins. This is not recommended for mass-spec data, for which the dM/M = K script should be used.

Spectral (proteomics) Data (dM/M = K)

Use this script to import mass spec data. It will rebin the data into bins that vary in width such that the constant resolution of the spectrometer is reflected in the bin width. This will ensure that channel height is at least a first order estimator of peak area.

Import Process

Multiple files are processed into a single dataset. The sample order of the imported dataset is determined by the order of the source sample data files listed in the Import Data dialog.

Any file headers are discarded.
Spectra are re-binned to have approximately 20,000 bins. Rebinning is done in a fully area-preserving way, with original bins that span a bin boundary in the rebinned spectrum split properly across the boundary.
If the dM/M = K version of the script is used, the spectra will be rebinned with bin width proportional to m/Z value. This is the recommended means of rebinning mass spec data, as it will ensure that each bin has equal significance over the full range of the spectrum, and make channel height a reasonable stand-in for peak area.
m/Z values are converted to "Gene Identifiers" by pre-pending a "P" in front of them. They should be treated as Custom identifiers.