Protocol for collection and integration of microarray data  
  Babru Samal  
  October 16, 2010  
     
  Source for raw data: GEO (Gene Expression Omnibus )  
   
     
  The Protocol:  
 
   

Processed raw signals from GSE files were collected. CEL files were not always available and hence excluded. GPL link served as the source for the microarray platforms used. For each dataset (GSE) the expression data from GSM files were aligned side by side using the Probeset IDs as the common elements. Present and absent calls were ignored as it was not available for all the datasets. Average expression value was derived for each platform in a GSE and normalized assuming the expression value for beta2 microglobulin to be 100. Then the expression of the genes for each region was combined from different investigators and platforms using JMP software which grouped the expression values on Entrez Gene ID basis. Once the values derived from various GSE were aligned, the final relative expression values were derived as an average (if there were three or less GSEs) or trimmean (if there were more than three GSEs).The final files from each region were then imported to Microsoft ACCESS for creating the dataset for the MySQLdatabase.        

Latest gene related information was obtained from ftp://ftp.ncbi.nih.gov/gene/ DATA/GENE_INFO/Mammalia/   as Mus_musculus.gene_info.gz . The essential gene information, i.e. Entrez GeneID, gene symbol, gene title, HGNC/MGI/RGD and Ensembl etc information was collected as a text file and moved to Microsoft ACCESS. The averaged expression values from each region were then lined up against this file using the GeneID as the common element.     

This final table was saved as a text file and uploaded to the MySQL database online which served as the source for the expression data display.