Hi Alex,
I'm not one of the developers, but I've mucked around in the code a decent amount, so hopefully I can give you a bit of a hand.
The sequence of states isn't going to be saved in the Trajectories directory. I believe that should only contain the concatenated trajectory files (raw data or potentially a subset of atoms in your system) for each run that you are using to build the MSM in .lh5 format, where the 'l' stands for 'lossy' in that the precision has been reduced such as done with the xtc format in Gromacs. I don't think this is actually a different format than .h5, but rather just a naming convention used by MSMBuilder.
I think the file you're looking for is Assignments.h5 or possibly Assignments.Fixed.h5 which should have been generated when you run the Assign.py script, and will be, by default, placed in your 'Data' directory. I don't have much experience with using the tools that come with hdf5 like h5dump, but one potential issue is that the Pytables package which is used to create the .h5 files, while depending on hdf5, might not write something that is strictly portable since it is adds some stuff beyond what is implemented in the standard, but I'm not 100% sure about that.
Probably the easiest thing to do is to write a small python script that imports some of the modules in MSMBuilder that handle the data storage to read in/load the data from the .h5 file. Alternatively you can access it directly using pytables (
http://www.pytables.org) using something like the following:
Code: Select all
import tables
f = tables.File('Assignments.h5','r')
assignments = f.root.Data[:]
Now assignments is a num_trajectories by max # of frames in any one of the trajectories, shaped array, with entries of the discrete state index for each frame. There will also be entries of -1 that fill in elements when a trajectory is shorter than the longest trajectory's length, or the state to which it was assigned was removed during ergodic pruning.
You can then export the numpy assignments array to a text file or some other format for analysis, although I find using python/numpy is usually the easiest way to go to avoid working with a bunch of different languages/formats in my analysis workflow.
I hope this helps.
Josh