EuclidComputeDrift: DONE 1) Behave well both with and without mpi DONE 2) Process new yaml format, esp. taus and minkowski p value DONE 3) Print output to disk DONE 4) Naming convention for output files? Pick on 5) Document what the paths in the yaml input are relative to drift.py 1) Implement contact_drift() leastsq.py: Done 1) ols() breaks without any informative error when the predictor matrix is of deficient rank. EuclidRunLSRegression: 1) make it take a list of hdf5 files instead of a directory as positional arguments so you can use bash shell expansion to more easily pick which metric you want as input 2) --tau flag to select only a subset of the taus to use. I'm urious to see how the correlation coefficient changes as you increase the range of taus used 3) --ols flag to also run ols (always runs nnls) 4) --norm flag to normalize each predictor by its standard deviation 5) print output to file: output format? (needs to document the order that the betas are in with respect to the metrics). Maybe the output should be a hash of metric name to value, along with R^2 and stuff. YAML 6) Make sure code is doing the right thing when there is only 1 predictor. Did I see a negative R^2? ?More Powerful Regression Methods, feature selection and significance testing? 1) Integrate with R (Rpy) for access to these libraries. 2) epsilon-SVR? Through Shogun? http://www.shogun-toolbox.org/ - will linear combinations of the metrics in the higher dimension space (implicitly via kernel trick) be guaranteed to be a metric itself. I think so. But what about negative coefficients? DONE - Ultimately, we also need a subclass or drop in replacement of msmbuilder's DistanceMetric. It should use the same "PrepareData" calls, but obviously instead of PrepareData returning a TheoData, it's going to return a container with a TheoData & contact vector & dihedral vector. Need to look at the methods that calls DistanceMetric more closely to see how they use it, since nothing's really private and there isn't a well defined separation of implementation and interface.