GRO file format parser in Python

Tobias Sikosek · Post by **Tobias Sikosek** » Tue Feb 25, 2014 9:32 pm

Hi everyone,

I just installed OpenMM 6.0 and the tests all passed fine.

Then I tried importing a GRO file in a Python script and got this error:

Code: Select all

Traceback (most recent call last):
  File "omm_test.py", line 6, in <module>
    gro = GromacsGroFile(sys.argv[1]) 
  File "/Library/Python/2.7/site-packages/simtk/openmm/app/gromacsgrofile.py", line 149, in __init__
    raise Exception("Unexpected line in .gro file: "+line)
Exception: Unexpected line in .gro file:     1  THR    N    1   6.503   7.301  12.753

When looking into the Python code of gromacsgrofile.py I found the following method to recognize atom lines in a GRO file:

Code: Select all

def _is_gro_coord(line):
    """ Determines whether a line contains GROMACS data or not

    @param[in] line The line to be tested

    """
    sline = line.split()
    if len(sline) == 6 or len(sline) == 9:
        return all([_isint(sline[2]), _isfloat(sline[3]), _isfloat(sline[4]), _isfloat(sline[5])])
    elif len(sline) == 5 or len(sline) == 8:
        return all([_isint(line[15:20]), _isfloat(sline[2]), _isfloat(sline[3]), _isfloat(sline[4])])
    else:
        return 0

According to this, there should be either 5,6,8, or 9 columns. However, my GRO file has 7 columns.

The Gromacs manual says: (http://manual.gromacs.org/online/gro.html)

Columns contain the following information (from left to right):
residue number (5 positions, integer)
residue name (5 characters)
atom name (5 characters)
atom number (5 positions, integer)
position (in nm, x y z in 3 columns, each 8 positions with 3 decimal places)
velocity (in nm/ps (or km/s), x y z in 3 columns, each 8 positions with 4 decimal places)

Since velocities are not present in my file, 7 columns seems to be ok.

Is there there something I'm missing or is this a bug?

Cheers!
Tobias

Lee-Ping Wang · Post by **Lee-Ping Wang** » Tue Feb 25, 2014 10:36 pm

Hi Tobias,

Thanks for pointing this out. I implemented the GRO file parser and there could be an issue with it, but in this case I think it's working properly.

I think what's happening is that the residue number is usually right-indented while the residue name is usually left-indented so it looks like "1THR", whereas in your case you have right-indented both number and name so it looks like "1 THR". This separates the residue number and name with spaces and creates the extra field that confuses the parser.

The manual specifies the indentation (note the dash in the second field):

Code: Select all

    "%5d%-5s%5s%5d%8.3f%8.3f%8.3f%8.4f%8.4f%8.4f"

Thus, if you left-indent the residue name the parser should work.

That said, since .gro is fixed format, the functions _is_gro_box and _is_gro_coord might still fail if the fields are five characters long (i.e. there is no space to separate the fields.) Thus, these functions should probably be removed, and the parser would then look something like this:

Code: Select all

        for line in open(file):
            if ln == 0:
                comms.append(line.strip())
            elif ln == 1:
                na = int(line.strip())
            elif ln == na + 2:
                sline = line.split()
                boxes.append(tuple([float(i) for i in sline])*nanometers)
                xyzs.append(xyz*nanometers)
                xyz = []
                ln = -1
                frame += 1
            else:
                if frame == 0: # Create the list of residues, atom names etc. only if it's the first frame.
                    (thisresnum, thisresname, thisatomname) = [line[i*5:i*5+5].strip() for i in range(3)]
                    resname.append(thisresname)
                    resid.append(int(thisresnum))
                    atomname.append(thisatomname)
                    thiselem = thisatomname
                    if len(thiselem) > 1:
                        thiselem = thiselem[0] + sub('[A-Z0-9]','',thiselem[1:])
                        try:
                            elements.append(elem.get_by_symbol(thiselem))
                        except KeyError:
                            elements.append(None)
                firstDecimalPos = line.index('.', 20)
                secondDecimalPos = line.index('.', firstDecimalPos+1)
                digits = secondDecimalPos-firstDecimalPos
                pos = [float(line[20+i*digits:20+(i+1)*digits]) for i in range(3)]
                xyz.append(Vec3(pos[0], pos[1], pos[2]))
            ln += 1

Try this out and let me know if it works.

Thanks,

- Lee-Ping

Tobias Sikosek · Post by **Tobias Sikosek** » Wed Feb 26, 2014 9:24 am

Hi Lee-Ping!

Thanks for the quick reply. You're absolutely right about the column alignments.
My GRO file was generated by some other tool, not Gromacs. That would explain the difference.

I have tested the code you have posted and it works with my GRO file.

I think the parsing is a little more robust this way even though it is absolutely correct in its current form.

I'm now getting an error when parsing the TOP file. It was generated by the same tool as my GRO file, so probably it's a similar problem. I will investigate and maybe open a new thread if need be.

Tobias

Lee-Ping Wang · Post by **Lee-Ping Wang** » Wed Feb 26, 2014 9:28 am

Cool. May I ask which tool this is?

If I recall correctly, .top is not a fixed with format and there are no indentation issues, but there are many more places things could go wrong.

Tobias Sikosek · Post by **Tobias Sikosek** » Wed Feb 26, 2014 10:14 am

I'll just continue in this thread since it's related.

The exception I get with the TOP file is:

Code: Select all

Traceback (most recent call last):
  File "omm_test.py", line 7, in <module>
    top = GromacsTopFile(sys.argv[2], unitCellDimensions=gro.getUnitCellDimensions(), includeDir='/usr/local/gromacs/share/gromacs/top') 
  File "/Library/Python/2.7/site-packages/simtk/openmm/app/gromacstopfile.py", line 399, in __init__
    self._processFile(file)
  File "/Library/Python/2.7/site-packages/simtk/openmm/app/gromacstopfile.py", line 71, in _processFile
    self._processLine(append+' '+line, file)
  File "/Library/Python/2.7/site-packages/simtk/openmm/app/gromacstopfile.py", line 152, in _processLine
    self._processDefaults(line)
  File "/Library/Python/2.7/site-packages/simtk/openmm/app/gromacstopfile.py", line 190, in _processDefaults
    raise ValueError('Too few fields in [ defaults ] line: '+line);
ValueError: Too few fields in [ defaults ] line:             1           1 no

My defaults line reads:

Code: Select all

[ defaults ]
;nbfunc comb-rule gen-pairs
           1           1 no

The tool that generates these files is SMOG (http://smog-server.org). The files work fine in Gromacs but they might look a little different than most TOP files that are used with OpenMM.

The code in gromacstopfile.py is :

Code: Select all

    def _processDefaults(self, line):
        """Process the [ defaults ] line."""
        fields = line.split()
        if len(fields) < 4:
            raise ValueError('Too few fields in [ defaults ] line: '+line);
        if fields[0] != '1':
            raise ValueError('Unsupported nonbonded type: '+fields[0])
        if fields[1] != '2':
            raise ValueError('Unsupported combination rule: '+fields[1])
        if fields[2].lower() == 'no':
            raise ValueError('gen_pairs=no is not supported')
        self._defaults = fields

So "gen_pairs=no" is not supported? Is there any particular reason for that or has it just not been implemented yet?

GRO file format parser in Python

GRO file format parser in Python

Re: GRO file format parser in Python

Re: GRO file format parser in Python

Re: GRO file format parser in Python

Re: GRO and TOP file format parser in Python