Primary Publication
J. Barhak, A. Garrett, Population Generation from Statistics Using Genetic Algorithms with MIST + INSPYRED. MODSIM World 2014, April 15 - 17, Hampton Roads Convention Center in Hampton, VA (2014)  View

Clinical trial population information is typically restricted and individual data is not public. However, clinical trial results and population statistics are regularly published. It is possible to reconstruct mock individual data populations from these statistics to support disease modeling and better understand the population characteristics. This can help in both planning and analysis of trial results on a larger information scope involving multiple clinical trials. A fairly simple example of generating an individual population from aggregate statistics is as follows: generate 1000 individuals such that their mean age would be 61 with SD of 8.2 and mean age at diagnosis of diabetes would be 53. Even this simple example has constraints such as age at diagnosis of diabetes should be lower than the individual age which will cause a skewed distribution. Reconstructing a mock population that matches clinical trial statistics is more complex and involves multiple objectives and interactions between statistics. This work improves the Monte Carlo abilities of the MIcro Simulation Tool (MIST) to generate populations from statistics by introducing genetic algorithms supported by the INSPYRED software package. The genetic algorithm improves the accuracy of the reconstructed population and better handles skewed distributions and constraints. MIST and INSPYRED are both free software available under GPL license and can be downloaded through these links:

Related Publications
J. Barhak, The Reference Model for Disease Progression uses MIST to find data fitness. PyData Silicon Valley 2014 held at Facebook Headquarters. Abstract: Presentation: Video: (2014)  View

The Reference Model for Disease progression [1,2] is a league of disease models that compete amongst themselves to fit existing clinical trial results. Clinical Trial results are widely available publicly at the summary level. Disease models are typically extracted from a single trial and therefore may not generalize well to all populations. The Reference Model determines the fitness of multiple disease models for multiple populations and helps deduce better fitting scenarios. This is done using High Performance Computing (HPC) techniques that support Monte Carlo simulation at the Micro individual level. The MIcro Simulation Tool (MIST) [3,4] facilitates running those simulations in HPC environment using Sun Grid Engine (SGE) [5]. MIST can even run over the cloud using StarCluster [6] and an anaconda AMI [7]. Note, however, the public published data is summary data while simulations are conducted at the individual level. The individual population is reconstructed from summary data using the MIST Domain Specific Language (DSL) and is optimized using evolutionary computation using Inspyred [8]. This allows creating populations that conform to the clinical trial summary statistics and allow incorporating trial inclusion and exclusion criteria as well as cope with skewed population distributions. The Reference Model allows exploring new assumptions and hypothesis about disease progression and determines their fitness to existing population/model data. These virtual trials consider much more information than a single trial, using already available and public data. Links to relevant own publications: [1] The Reference Model video: [2] The Reference Model short description: [3] MIST video presentation: [4] MIST github repository: Links to external free software tools relevant to this work: [5] SGE: [6] INSPYRED github repository: [7] StarCluster home page: [8] B. Zaitlen, StarCluster Anaconda. Online:

J. Barhak, MIST: Micro-Simulation Tool to Support Disease Modeling. SciPy, 2013, Bioinformatics track. (2013)  View

MIST stands for Misco-Simulation Tool. It is a modeling and simulation framework that supports computational Chronic Disease Modeling activities. It is a fork from the IEST = Indirect Estimation and Simulation Tool GPL modeling framework. MIST removes complexity associated with the estimation engine, with parameter definitions, and with rule restrictions. This significantly simplifies the system and allows its development in the Micro-simulation path less encumbered. The incentive to split MIST was to adapt the code to use newer compiler technology to speed up simulations. There is wrong skepticism in the medical disease modeling community towards using Interpreters for simulations due to performance issues. The use of advanced compiler technology with Python may remedy this misconception and provide optimized python based simulations. MIST is a first step in this direction. MIST takes care of a few documented and known issues. It also moves to use new scientific Python stacks such as Anaconda and PythonXY as its platform. This improves its accessibility to less sophisticated users that can now benefit from easier installation. The Reference Model for disease progression intends to use MIST as its main platform. Yet MIST is equipped with a Micro-simulation compiler designed to accommodate Monte Carlo simulations for other purposes. Additional Information : - Video of Talk : - Soruce Code : - Slogan : MIST runs over the cloud!