Hi,
I'm using MSMBuilder2.0 (and enjoying it very much so far ), and I am trying to develop an external program that uses the output of the files in the "Trajectory" folder (EDIT: this is actually in Data/Assignments.h5), which show the sequence of states visited along each trajectory (if I remember from MSMBuilder 1). Right now they are in .lh5 format. I tried using "h5dump" with no success, so I guess .lh5 files are different than .h5?
Is there an easy way to do this?
Thanks,
Alex
Convert .lh5 trajectory files to readable format?
- Alex Dickson
- Posts: 3
- Joined: Fri Mar 30, 2012 2:07 pm
Convert .lh5 trajectory files to readable format?
Last edited by Alex Dickson on Fri Apr 13, 2012 6:18 am, edited 1 time in total.
- Joshua Adelman
- Posts: 20
- Joined: Thu Feb 21, 2008 4:42 pm
Re: Convert .lh5 trajectory files to readable format?
Hi Alex,
I'm not one of the developers, but I've mucked around in the code a decent amount, so hopefully I can give you a bit of a hand.
The sequence of states isn't going to be saved in the Trajectories directory. I believe that should only contain the concatenated trajectory files (raw data or potentially a subset of atoms in your system) for each run that you are using to build the MSM in .lh5 format, where the 'l' stands for 'lossy' in that the precision has been reduced such as done with the xtc format in Gromacs. I don't think this is actually a different format than .h5, but rather just a naming convention used by MSMBuilder.
I think the file you're looking for is Assignments.h5 or possibly Assignments.Fixed.h5 which should have been generated when you run the Assign.py script, and will be, by default, placed in your 'Data' directory. I don't have much experience with using the tools that come with hdf5 like h5dump, but one potential issue is that the Pytables package which is used to create the .h5 files, while depending on hdf5, might not write something that is strictly portable since it is adds some stuff beyond what is implemented in the standard, but I'm not 100% sure about that.
Probably the easiest thing to do is to write a small python script that imports some of the modules in MSMBuilder that handle the data storage to read in/load the data from the .h5 file. Alternatively you can access it directly using pytables (http://www.pytables.org) using something like the following:
Now assignments is a num_trajectories by max # of frames in any one of the trajectories, shaped array, with entries of the discrete state index for each frame. There will also be entries of -1 that fill in elements when a trajectory is shorter than the longest trajectory's length, or the state to which it was assigned was removed during ergodic pruning.
You can then export the numpy assignments array to a text file or some other format for analysis, although I find using python/numpy is usually the easiest way to go to avoid working with a bunch of different languages/formats in my analysis workflow.
I hope this helps.
Josh
I'm not one of the developers, but I've mucked around in the code a decent amount, so hopefully I can give you a bit of a hand.
The sequence of states isn't going to be saved in the Trajectories directory. I believe that should only contain the concatenated trajectory files (raw data or potentially a subset of atoms in your system) for each run that you are using to build the MSM in .lh5 format, where the 'l' stands for 'lossy' in that the precision has been reduced such as done with the xtc format in Gromacs. I don't think this is actually a different format than .h5, but rather just a naming convention used by MSMBuilder.
I think the file you're looking for is Assignments.h5 or possibly Assignments.Fixed.h5 which should have been generated when you run the Assign.py script, and will be, by default, placed in your 'Data' directory. I don't have much experience with using the tools that come with hdf5 like h5dump, but one potential issue is that the Pytables package which is used to create the .h5 files, while depending on hdf5, might not write something that is strictly portable since it is adds some stuff beyond what is implemented in the standard, but I'm not 100% sure about that.
Probably the easiest thing to do is to write a small python script that imports some of the modules in MSMBuilder that handle the data storage to read in/load the data from the .h5 file. Alternatively you can access it directly using pytables (http://www.pytables.org) using something like the following:
Code: Select all
import tables
f = tables.File('Assignments.h5','r')
assignments = f.root.Data[:]
You can then export the numpy assignments array to a text file or some other format for analysis, although I find using python/numpy is usually the easiest way to go to avoid working with a bunch of different languages/formats in my analysis workflow.
I hope this helps.
Josh
- Alex Dickson
- Posts: 3
- Joined: Fri Mar 30, 2012 2:07 pm
Re: Convert .lh5 trajectory files to readable format?
Thanks Josh! This works wonderfully.
Thanks again..
Alex
I'd believe it. And I know I should learn python one day, but I'm happy that today is not that dayI find using python/numpy is usually the easiest way to go to avoid working with a bunch of different languages/formats in my analysis workflow.
Thanks again..
Alex
- Robert McGibbon
- Posts: 20
- Joined: Tue Jul 19, 2011 9:25 am
Re: Convert .lh5 trajectory files to readable format?
The easiest way to do this is to use MSMBuilder's Serializer object which wraps pytables.
In msmbuilder2.5, you can get the object from an interactive python interpreter with "from msmbuilder import Serializer". In msmbuilder2.0, you can get the object with "from msmbuilder.Serializer import Serializer".
Then you can open these h5 files like Assignments.h5 with "Serializer.LoadFromHDF('filename.h5').
-Robert
In msmbuilder2.5, you can get the object from an interactive python interpreter with "from msmbuilder import Serializer". In msmbuilder2.0, you can get the object with "from msmbuilder.Serializer import Serializer".
Then you can open these h5 files like Assignments.h5 with "Serializer.LoadFromHDF('filename.h5').
-Robert