The program can be downloaded here
This version is based on the previous one, but has two additions:
it contemplates the use of replicate samples and requires a significantly high number of positive pairs and a low number of negative pairs for the genes identified.
This program identifies genes enriched in one sample with respect to all others by evaluating specific parameters from microarray data provided by the affymetrix software.
The code is written in Matlab (The Mathworks), which you need to install in your computer for running the program.
The parameters used for comparison are: average difference, positive pairs, negative pairs, and pairs used.
It can consider the existence of replicate analyses, so that each replicate from each sample can be independently compared to all replicates from the remaining samples.
The program identifies genes that are enriched in one sample with respect to all others based on three criteria:
i) the identified genes must be expressed at high abundance levels (thus they must have average difference value above a minimum threshold value which can be set by the user)
ii) they must be overexpressed relative to all other samples by a minimum factor, set by the user.
iii) they should possess significant number of positive pairs (and low number of negative pairs), as determined by the affymetrix software (which indicates the significance of hybridization of biotinylated cRNA to the probe cells on the arrays). The number of [positive- negative] pairs can be also specified by the user, and it can vary depending on the total number of pairs used to represent the probe set on the arrays. Thus, this minimum number of positive cells can be independently set for probes that are represented by high or low number of pairs. The cutoff for probes represented by high numbers of pairs (typically 16) should be indicated under “dposneg2” in this program, and the cutoff for probes represented by lower number of pairs should be specified in “dposneg1”.
When you open the program, the following graphic user interphase appears:

You should provide the information that is prompted in each case. To understand the parameters requested, below is a description of the software:
Formally, the code finds genes enriched in sample X that satisfy the following conditions:
ii) [Average difference for gene G in sample X ] / [average difference for gene G in all other samples] > ratio threshold; OR <0
iii) Pos – neg > dposneg2, if pairs used >= pairs threshold
Pos-neg> dposneg1, if pairs used < pairs threshold
The data must be fed in matrix format. Each sample must be saved in separate text files. In each case,
number of columns= number of replicates x 4
number of rows= number of genes.
You must make a text file containing the following columns in this order, for each sample.
|
average
difference (replicate 1) |
positive (replicate
1) |
negative
(replicate
1) |
pairs
used (replicate
1) |
average
difference (replicate 2) |
positive (replicate
2) |
negative
(replicate
2) |
pairs
used (replicate
2) |
Followed by further columns as necessary for further replicates. There should be as many rows as the number of genes being compared. (Make sure the genes are sorted in the same way (e.g., alphabetically) in all replicates and in all samples.)
To start:
Under the File menu, select “Set Path”. Under the Path menu, select “ Add to Path”. Write here the name of the directory where this code has been downloaded (e.g. c:/temp). Then click OK. Finally, under the File menu, select “Save Path”, and then “Exit”.
Now you are ready to start the program. Simply type
tk_gui
The graphic user interphase shown above will appear.
Provide the data prompted:
To see the results, open the output file in excel, select to open the data as delimited by space (not tab), and you will find the list of indexes corresponding to genes enriched in each replicate. (For example
|
Sample |
1 |
|
||||||
|
List |
of |
potentially |
intersting |
transcripts |
in |
each |
replicate |
|
|
253 |
506 |
1197 |
1768 |
|
|
|
|
|
means that the 253th, 506th, 1197th, and 1768th genes (in the 253th, 506th, 1197th, and 1768th rows of the data matrix that you loaded) are enriched in sample 1 with respect to all other samples.
Moreover, if you have multiple replicates for each sample, the genes (i.e. gene indexes) identified in each replicate are indicated, together with the genes that appear to be enriched in all replicates of the same sample).
To see the names of the genes or all the other parameters provided by the affymetrix software (i.e. average difference value, positive, etc), we recommend that you use Microsoft Access to quickly show the values for the enriched genes corresponding to the indexes identified.
In addition, in the output file the thresholds used for comparison are also saved for future reference.
Let’s assume the following case: for gene comparison of samples A, B, and C, duplicate analyses are performed. A1 is sample A, replicate 1; A2 is sample A, replicate 2; B1 is replicate 1 from sample B, etc.
To find genes enriched in B1, for example, all values from B1 are simultaneously compared against A1, A2, C1 and C2. The genes that satisfy the criteria above described are then identified. Genes enriched in B2 are independently compared, also against A1, A2, C1, and C2.
There should be 3 data files (one for A, one for B and one for C), each containing the columns with the information and in the order specified above.
n_samples= 3
n_replicates=2
The other parameters can be arbitrarily determined. To start, we recommend the values that appear as default in the program.
The output file will provide the indexes of the genes enriched in each replicate for each of your samples (A, B, and C), which are numbered by the order that you loaded them onto the program.