Introduction

- In spite of scientific progress, much remains to be discovered about
  the genetic basis of human disease. Of the approximately 20,000
  genes identified in the human genome to date, only 30% have been
  associated with disease, providing an opportunity to fill this
  knowledge gap by predicting genes likely to be associated with a
  given disease.

- The hubs of molecular interaction networks, like busy interactions
  in road networks, provide one way of assessing the importance of a
  node or location. We propose a new method that nominates candidate
  genes for a disease by finding nodes that act as hub to a greater
  degree in the disease's subnetwork than in the entire network.


Methods

- Starting with a list of disease-associated genes (seeds), and a
  interaction network that includes metabolic, protein-protein, and
  transcription-factor interactions, we built a disease-specific
  subnetwork by generating all shortest paths between pairs of seed
  genes, counting the number of times each node appears on a path.
  Betweenness centrality (a measure of traffic through each node) was
  computed for each node in the subnetwork, and for the same nodes in
  the complete network.

- The node count and two betweenness measures formed the input to a
  set of algorithms designed to separate the disease-specific nodes
  from their all-disease background. Combinatorial optimization was
  used to find the best choice of algorithm and measure thresholds by
  scoring each with the average rank of the disease's seed genes.

Results

- We performed two kinds of validation. First, the distribution of a
  disease's known seeds in the node scores was analyzed. More than 50%
  of all seeds occurred in the top 11th percentile of the list for
  Schizophrenia, showing strong enrichment of disease-related genes.
  Of the top 5 genes on this list not already known to be associated
  with Schizophrenia, 3 had been previously connected to Autism, Brain
  Diseases, or other neurological disorders in previously published
  scientific literature. Second, a list of 244 newly discovered genes
  related to Schizophrenia and not used for training was scored using
  our method. Of the 89 genes that appeared in our interaction
  network, 48% appeared in the top 18% of our ranked scores, once again
  demonstrating significant enrichment.

Conclusion

- We have presented a method to propose new disease-gene associations
  from known associations and a large molecular interaction network by
  using network properties to find disease-specific hubs. Our method
  is an effective way to make predictions of novel candidate disease
  genes for experimental validation.