Page 1 of 1

Convert .lh5 trajectory files to readable format?

Posted: Thu Apr 12, 2012 8:19 am
by adickson
Hi,

I'm using MSMBuilder2.0 (and enjoying it very much so far :D), and I am trying to develop an external program that uses the output of the files in the "Trajectory" folder (EDIT: this is actually in Data/Assignments.h5), which show the sequence of states visited along each trajectory (if I remember from MSMBuilder 1). Right now they are in .lh5 format. I tried using "h5dump" with no success, so I guess .lh5 files are different than .h5?

Is there an easy way to do this?

Thanks,

Alex

Re: Convert .lh5 trajectory files to readable format?

Posted: Thu Apr 12, 2012 9:17 pm
by jadelman
Hi Alex,

I'm not one of the developers, but I've mucked around in the code a decent amount, so hopefully I can give you a bit of a hand.

The sequence of states isn't going to be saved in the Trajectories directory. I believe that should only contain the concatenated trajectory files (raw data or potentially a subset of atoms in your system) for each run that you are using to build the MSM in .lh5 format, where the 'l' stands for 'lossy' in that the precision has been reduced such as done with the xtc format in Gromacs. I don't think this is actually a different format than .h5, but rather just a naming convention used by MSMBuilder.

I think the file you're looking for is Assignments.h5 or possibly Assignments.Fixed.h5 which should have been generated when you run the Assign.py script, and will be, by default, placed in your 'Data' directory. I don't have much experience with using the tools that come with hdf5 like h5dump, but one potential issue is that the Pytables package which is used to create the .h5 files, while depending on hdf5, might not write something that is strictly portable since it is adds some stuff beyond what is implemented in the standard, but I'm not 100% sure about that.

Probably the easiest thing to do is to write a small python script that imports some of the modules in MSMBuilder that handle the data storage to read in/load the data from the .h5 file. Alternatively you can access it directly using pytables (http://www.pytables.org) using something like the following:

Code: Select all

import tables
f = tables.File('Assignments.h5','r')
assignments = f.root.Data[:]
Now assignments is a num_trajectories by max # of frames in any one of the trajectories, shaped array, with entries of the discrete state index for each frame. There will also be entries of -1 that fill in elements when a trajectory is shorter than the longest trajectory's length, or the state to which it was assigned was removed during ergodic pruning.

You can then export the numpy assignments array to a text file or some other format for analysis, although I find using python/numpy is usually the easiest way to go to avoid working with a bunch of different languages/formats in my analysis workflow.

I hope this helps.

Josh

Re: Convert .lh5 trajectory files to readable format?

Posted: Fri Apr 13, 2012 6:15 am
by adickson
Thanks Josh! This works wonderfully.
I find using python/numpy is usually the easiest way to go to avoid working with a bunch of different languages/formats in my analysis workflow.
I'd believe it. And I know I should learn python one day, but I'm happy that today is not that day ;)

Thanks again..

Alex

Re: Convert .lh5 trajectory files to readable format?

Posted: Tue Jun 19, 2012 2:13 pm
by rmcgibbo
The easiest way to do this is to use MSMBuilder's Serializer object which wraps pytables.

In msmbuilder2.5, you can get the object from an interactive python interpreter with "from msmbuilder import Serializer". In msmbuilder2.0, you can get the object with "from msmbuilder.Serializer import Serializer".

Then you can open these h5 files like Assignments.h5 with "Serializer.LoadFromHDF('filename.h5').

-Robert