Topics to cover in "MBAR for dummies" - aim for JCTC / PLOS ONE? Review? Introduction * Summary of theory at less technical level * Relation to previous theories (WHAM, BAR, FEP) * References to more complicated theory ( Different ways to look at it: * Maximum likelihood estimates of free energy and observables for the data collected. * Best possible weighting estimator using only the samples obtained (and no other info). * interpretation of the density of states as a sum of weighted delta functions. * "Weighted" FEP from a mixture distribution: exp(-f_i) = \sum_{n=1}^N exp(-u_i(x_n)) / (\sum_{k=1}^K N_k/N exp(f_k-u_k(x_n))) Nice features: * Describe the overlap matrix as a tool to understand overlap between states (which example do we use this for?) * Discussion of uncertainty estimate (reference the answers with free energy calculations) * Valid for arbitrary nonBoltzmann distributions (is there a good example) * rewrite formalism in terms of KxN matrix, instead of KxKxN matrix. * Computing expectation values is also made trivial by MBAR. * Discuss bootstrapping to calculate errors, block bootstrapping, correlations (note -- is this our solution for large numbers of states - give up on the analytic error estimate, and just parallelize lots of bootstraps?) * Can we add bootstrapping tools? * Discussing and illustrating good properties * normality of error estimates (harmonic examples), and agreement of uncertainty (cite other papers). * speed and convergence of solutions, and what we did to get it. * Interesting things that can be done * Scans of simulation parameters (Paliwal paper) * Comparison to WHAM (perhaps include a binning functionality to allow explicit/direct comparison to WHAM)? * We have an example of lysozyme data to compare (this is in manuscripts/wham-compare/umbrella-sampling) * PMFs -- wham underweights the barrier -- it's a general problem of Kernel density estimation with top hat kernel. * interpretation of histogram wham as MBAR with discritized energies. * Describing the included functionality * describe capabilities (expectations) * timeseries module * Solutions to different problems (should be example code for each) * in each one, demonstrate appropriate correlation time. * calculating alchemical free energies (alchemical-free-energy) (MRS) * calculating expectation values (harmonic-oscillators, in the test cases) (MRS) * calculating entropy and enthalpy (working on separate paper, can refer to the notes entropy-and-enthalpy) (MRS) * calculating PMF's from umbrella sampling, including entropy decomposition (umbrella-sampling-pmf) (john) * multidimensional PMF from parallel exchange (parallel-tempering-2dpmf) (john) * Gibbs sampling paper (multidimensional-umbrella-sampling) (john) * experimental data (single-molecule-pulling) (john) * temperature dependent heat capacities from parallel tempering (heat-capacity, MRS, also see shirtgroup/pygo, comparison of fluctuation and temperature derivative properties) * computing properties over multidimensional spaces (pymbar-datasets/gas-properties) * consistency -- predict the data with state i? Are all the state samples consistent. Leave out the state and see if you get the same answer? * possible warnings What did Hummer paper say? -------------- Short term improvements (all done!) Check the new matrix version of the N-R equations (gives correct answers under normal circumstances) A simple validation suite (analytical answers, doesn't crash for tought problems, runs in 2-3 min) Check multiple expectations Fix perturbed expectations John needs to look at it. Robustly test it: (all done!) Test all functions with analytical. Write validation suite. Medium term (1 month) Tutorial for each example on pymbar web page - prepared in parallel - prepared in sphyx consistency -- predict the data with state i? Are all the state samples consistent. Leave out the state and see if you get the same answer? Long term improvements (2-3 months, in time for PYMBAR) Better documentation (handled) More robust validation; make sure it passes all the "hard" problems people have given us, harden all parts that can be hardened. GPU extensions Make examples from hard problems people give us? Extension of functionality (after that - probably separate papers) Sum over states instead of sum over samples (where K >> N) (Michael will work on) - useful for lambda-dynamics Poor' man's DCMBAR for PMF's (in situ evaluation of PMF's + parallelization) Old notes: Examples where variance estimator fails. Check why MBAR and BAR variance estimators fail for bad overlap! Check example sent to John.