Convert .lh5 trajectory files to readable format?

Provide extensible software for building Markov State Models.
POST REPLY
User avatar
Alex Dickson
Posts: 3
Joined: Fri Mar 30, 2012 2:07 pm

Convert .lh5 trajectory files to readable format?

Post by Alex Dickson » Thu Apr 12, 2012 8:19 am

Hi,

I'm using MSMBuilder2.0 (and enjoying it very much so far :D), and I am trying to develop an external program that uses the output of the files in the "Trajectory" folder (EDIT: this is actually in Data/Assignments.h5), which show the sequence of states visited along each trajectory (if I remember from MSMBuilder 1). Right now they are in .lh5 format. I tried using "h5dump" with no success, so I guess .lh5 files are different than .h5?

Is there an easy way to do this?

Thanks,

Alex
Last edited by Alex Dickson on Fri Apr 13, 2012 6:18 am, edited 1 time in total.

User avatar
Joshua Adelman
Posts: 20
Joined: Thu Feb 21, 2008 4:42 pm

Re: Convert .lh5 trajectory files to readable format?

Post by Joshua Adelman » Thu Apr 12, 2012 9:17 pm

Hi Alex,

I'm not one of the developers, but I've mucked around in the code a decent amount, so hopefully I can give you a bit of a hand.

The sequence of states isn't going to be saved in the Trajectories directory. I believe that should only contain the concatenated trajectory files (raw data or potentially a subset of atoms in your system) for each run that you are using to build the MSM in .lh5 format, where the 'l' stands for 'lossy' in that the precision has been reduced such as done with the xtc format in Gromacs. I don't think this is actually a different format than .h5, but rather just a naming convention used by MSMBuilder.

I think the file you're looking for is Assignments.h5 or possibly Assignments.Fixed.h5 which should have been generated when you run the Assign.py script, and will be, by default, placed in your 'Data' directory. I don't have much experience with using the tools that come with hdf5 like h5dump, but one potential issue is that the Pytables package which is used to create the .h5 files, while depending on hdf5, might not write something that is strictly portable since it is adds some stuff beyond what is implemented in the standard, but I'm not 100% sure about that.

Probably the easiest thing to do is to write a small python script that imports some of the modules in MSMBuilder that handle the data storage to read in/load the data from the .h5 file. Alternatively you can access it directly using pytables (http://www.pytables.org) using something like the following:

Code: Select all

import tables
f = tables.File('Assignments.h5','r')
assignments = f.root.Data[:]
Now assignments is a num_trajectories by max # of frames in any one of the trajectories, shaped array, with entries of the discrete state index for each frame. There will also be entries of -1 that fill in elements when a trajectory is shorter than the longest trajectory's length, or the state to which it was assigned was removed during ergodic pruning.

You can then export the numpy assignments array to a text file or some other format for analysis, although I find using python/numpy is usually the easiest way to go to avoid working with a bunch of different languages/formats in my analysis workflow.

I hope this helps.

Josh

User avatar
Alex Dickson
Posts: 3
Joined: Fri Mar 30, 2012 2:07 pm

Re: Convert .lh5 trajectory files to readable format?

Post by Alex Dickson » Fri Apr 13, 2012 6:15 am

Thanks Josh! This works wonderfully.
I find using python/numpy is usually the easiest way to go to avoid working with a bunch of different languages/formats in my analysis workflow.
I'd believe it. And I know I should learn python one day, but I'm happy that today is not that day ;)

Thanks again..

Alex

User avatar
Robert McGibbon
Posts: 20
Joined: Tue Jul 19, 2011 9:25 am

Re: Convert .lh5 trajectory files to readable format?

Post by Robert McGibbon » Tue Jun 19, 2012 2:13 pm

The easiest way to do this is to use MSMBuilder's Serializer object which wraps pytables.

In msmbuilder2.5, you can get the object from an interactive python interpreter with "from msmbuilder import Serializer". In msmbuilder2.0, you can get the object with "from msmbuilder.Serializer import Serializer".

Then you can open these h5 files like Assignments.h5 with "Serializer.LoadFromHDF('filename.h5').

-Robert

POST REPLY