As public microarray repositories rapidly accumulate gene expression data, these resources contain increasingly valuable information about cellular processes in human biology. This presents a unique opportunity for intelligent data mining methods to extract information about the transcriptional modules underlying these biological processes. Modeling cellular gene expression as a combination of functional modules, we use independent component analysis (ICA) to derive 423 fundamental components of human biology from a 9,395-array compendium of heterogeneous expression data. Annotation using the Gene Ontology (GO) suggests that while some of these components represent known biological modules, others may describe biology not well characterized by existing manually-curated ontologies. In order to understand the biological functions represented by these modules, we investigate the mechanism of the preclinical anticancer drug parthenolide (PTL) by analyzing the differential expression of our fundamental components. Our method correctly identifies known pathways and predicts that N-glycan biosynthesis and T-cell receptor signaling may contribute to PTL response. These gene modules have the potential to provide pathway-level insight into new gene expression datasets.
Provides a method for data-driven identification of gene expression modules
The study described in this publication used independent component analysis to extract information about fundamental human gene expression modules from a large compendium of microarray data. This project contains the code and results for this study.
1) An R package containing code for the project is provided. This package may be used to derive fundamental gene expression modules from new datasets (e.g. for a different species), or to analyze differential expression using the human fundamental components we have already identified.
2) Gene lists and annotations for our 423 human fundamental gene modules are provided for researchers interested in examining the transcriptional modules identified in our analysis.