6.3 Error - Spearate indices parsed into a single index

George Pantelopulos · Post by **George Pantelopulos** » Tue Sep 29, 2015 5:05 pm

Hi all,

I'm trying to use GMX gro and topology (+itp) files generated by CHARMM-GUI for lipid bilayer simulations in Openmm 6.3 and I'm getting a remarkably bizarre error in which all of the separate residues in the gro file are being strung together into a single residue in the topology, with each separate residue becoming a chain, after being read in by Openmm.

Here you can see that, after minimizing the system, the water molecules have seemingly been treated as one long chain:

To give a sample of what is contained in the .gro file generated by CHARMM-GUI:
56SSM C18F34101 8.650 1.140 3.797
56SSM H18F34102 8.671 1.244 3.829
56SSM H18G34103 8.660 1.096 3.898
56SSM H18H34104 8.552 1.105 3.764
1TIP3 OH234105 4.591 4.639 5.920
1TIP3 H134106 4.672 4.672 5.959
1TIP3 H234107 4.531 4.713 5.927

And the PDB that openmm writes...
ATOM 34380 C18F SSM T 1 83.291 12.590 35.141 1.00 0.00 C
ATOM 34381 H18F SSM T 1 83.539 13.669 35.233 1.00 0.00 H
ATOM 34382 H18G SSM T 1 83.092 12.192 36.159 1.00 0.00 H
ATOM 34383 H18H SSM T 1 82.350 12.450 34.567 1.00 0.00 H
TER 34384 SSM T 1
ATOM 34385 O HOH U 1 40.593 45.727 62.474 1.00 0.00 O
ATOM 34386 H1 HOH U 1 50.369 53.587 58.493 1.00 0.00 H
ATOM 34387 H2 HOH U 1 50.388 53.569 58.527 1.00 0.00 H
TER 34388 HOH U 1
And despite these TER arguments, all atoms in here seem to be treated as a single molecule regardless, as suggested by all atoms having the same residue index.

The molecules in the GMX topology file:
[ molecules ]
; Compound #mols
CHL1 56
POPC 168
SSM 56
TIP3 9258
SOD 30
CLA 30

Where TIP3 is TIP3P water, SOD is sodium, CLA is chloride.

Thank you for any assistance, this is very odd!
George

Peter Eastman · Post by **Peter Eastman** » Wed Sep 30, 2015 10:47 am

Hi George,

I'm not sure exactly what problem you're describing. Are you just referring to the fact that the waters all have residue ID 1? Residue IDs are specific to a particular chain, not global. So two different chains can each have a residue 1.

Peter

George Pantelopulos · Post by **George Pantelopulos** » Wed Sep 30, 2015 12:23 pm

Hi Peter,

The GRO file which is being read by GromacsGroFile contains many molecules, particularly molecules and lipids, of different residue indices.

GromacsGroFile seems to be interpreting all atoms everything as belonging to a single residue index. If everything had a qnique chain identifier this might be OK... But it seems that these chain identifiers are being repeated while all atoms are in the same residue index. And so it appears that many atoms that should be separated are being treated as one molecule despite being separated by a TER argument. At least, this may explain why, during minimization, it appears that atoms within each chain of the same identifier are being linked together to form the odd structure I posted above.

Taking a couple slices out of the starting gro file...
...
17POPC H16X 6420 2.853 4.595 3.731
17POPC H16Y 6421 2.744 4.731 3.698
17POPC H16Z 6422 2.754 4.573 3.612
...
1TIP3 OH234105 4.591 4.639 5.920
1TIP3 H134106 4.672 4.672 5.959
1TIP3 H234107 4.531 4.713 5.927
2TIP3 OH234108 5.567 6.638 6.148
2TIP3 H134109 5.554 6.732 6.165
2TIP3 H234110 5.612 6.635 6.063
3TIP3 OH234111 5.751 6.762 5.957
3TIP3 H134112 5.765 6.838 5.898
3TIP3 H234113 5.766 6.802 6.044

And a few slices out of the PDB file that is written out by Openmm based on the GROMACS topology and simulation coordinates...
...
ATOM 6492 H16X POP U 1 28.979 44.815 37.326 1.00 0.00 H
ATOM 6493 H16Y POP U 1 27.949 45.743 36.227 1.00 0.00 H
ATOM 6494 H16Z POP U 1 28.397 44.072 35.757 1.00 0.00 H
TER 6495 POP U 1
...
ATOM 34385 O HOH U 1 40.593 45.727 62.474 1.00 0.00 O
ATOM 34386 H1 HOH U 1 50.369 53.587 58.493 1.00 0.00 H
ATOM 34387 H2 HOH U 1 50.388 53.569 58.527 1.00 0.00 H
TER 34388 HOH U 1
...
ATOM 34697 O HOH U 1 52.443 81.380 20.454 1.00 0.00 O
ATOM 34698 H1 HOH U 1 59.447 81.146 21.597 1.00 0.00 H
ATOM 34699 H2 HOH U 1 59.554 81.147 21.568 1.00 0.00 H
TER 34700 HOH U 1

Is this clear? Any idea of how to approach this?

Peter Eastman · Post by **Peter Eastman** » Wed Sep 30, 2015 12:54 pm

It's hard to be sure the forum hasn't mangled the formatting, but it looks to me like your GRO file only has four columns for the residue number. Could you check that? There are supposed to be five columns (so an additional space at the start of each line). See http://manual.gromacs.org/current/online/gro.html.

If that's not the problem, could post the actual file so I can take a look at it?

Peter

George Pantelopulos · Post by **George Pantelopulos** » Wed Sep 30, 2015 1:28 pm

Ahh, maybe that has something to do with it...

The gro file describes a somewhat large system of 61938 particles.

This GRO file has 5 columns for atom numbers, which is standard according to the gromacs manual. This is not necessarily mangled... but it leaves no space between the atom name and the atom number which I can imagine could open up some sort of can of worms with parsers.

I've attached the gro file to this message. Thanks again for your help and your patience!

Peter Eastman · Post by **Peter Eastman** » Wed Sep 30, 2015 2:57 pm

Could you post the top file too? That's what it builds the Topology from.

Peter

George Pantelopulos · Post by **George Pantelopulos** » Wed Sep 30, 2015 3:12 pm

Yep! The .itp files are within as well.

topology_files.tar.gz: (23.18 KiB) Downloaded 46 times

Peter Eastman · Post by **Peter Eastman** » Wed Sep 30, 2015 3:55 pm

When I try to load it, I get this error:

Code: Select all

ValueError: Could not locate #include file: SOD.itp

It looks like that file is missing?

Peter

George Pantelopulos · Post by **George Pantelopulos** » Wed Sep 30, 2015 4:22 pm

Oh my apologies! That itp is with the others now.

Peter Eastman · Post by **Peter Eastman** » Thu Oct 01, 2015 10:28 am

Hi George,

I tested your files, and they seem to be working correctly. In this case, "correctly" means, "As correctly as can be expected under the circumstances."

First, I loaded both files:

Code: Select all

>>> top = GromacsTopFile('topol.top.txt')
>>> gro = GromacsGroFile('step5_charmm2gmx.gro.txt')

The Topology is created based on the TOP file. The GRO file is used only for coordinates. In this case, the Topology includes 9598 chains:

Code: Select all

>>> len(list(top.topology.chains()))
9598

That matches the number of molecules defined in the TOP file:

Code: Select all

[ molecules ]
; Compound	#mols
CHL1	          56
POPC	         168
SSM	          56
TIP3	        9258
SOD	          30
CLA	          30

I then had it print out a PDB file:

Code: Select all

>>> PDBFile.writeFile(top.topology, gro.positions)

Unfortunately, the PDB format is terribly outdated and not at all designed to be used in the ways people use it today. When writing a PDB file, you're supposed to distinguish the chains in two different ways: by giving a unique identifier to each one, and by including a TER record at the end of each one. But there's only a single column for the identifier! When the format was first defined, they were thinking about multiple protein chains in a complex, so the number of them would always be very small. They weren't thinking of including thousands of molecules in a single PDB file. If each chain identifier is only a single letter, there's no way to give a unique identifier to each of thousands of molecules. So OpenMM does the best it can, and just keeps looping through the alphabet:

Code: Select all

ATOM  53017  O   HOH Y   1      53.730  79.270  75.080  1.00  0.00           O  
ATOM  53018  H1  HOH Y   1      54.320  78.810  75.690  1.00  0.00           H  
ATOM  53019  H2  HOH Y   1      52.870  78.820  75.190  1.00  0.00           H  
TER   53020      HOH Y   1
ATOM  53021  O   HOH Z   1      55.770  78.100  76.610  1.00  0.00           O  
ATOM  53022  H1  HOH Z   1      56.180  77.400  76.050  1.00  0.00           H  
ATOM  53023  H2  HOH Z   1      56.500  78.750  76.670  1.00  0.00           H  
TER   53024      HOH Z   1
ATOM  53025  O   HOH A   1      39.340  76.140  77.660  1.00  0.00           O  
ATOM  53026  H1  HOH A   1      39.830  76.750  77.080  1.00  0.00           H  
ATOM  53027  H2  HOH A   1      38.750  76.750  78.110  1.00  0.00           H  
TER   53028      HOH A   1
ATOM  53029  O   HOH B   1      41.190  78.090  76.090  1.00  0.00           O  
ATOM  53030  H1  HOH B   1      41.700  77.410  75.610  1.00  0.00           H  
ATOM  53031  H2  HOH B   1      41.820  78.370  76.780  1.00  0.00           H  
TER   53032      HOH B   1

The much newer PDBx/mmCIF format fixes this problem by allowing an arbitrary length identifier for each chain. So if you write that format instead, OpenMM can give an actual unique identifier to each chain.

Peter

6.3 Error - Spearate indices parsed into a single index

6.3 Error - Spearate indices parsed into a single index

Re: 6.3 Error - Spearate indices parsed into a single index

Re: 6.3 Error - Spearate indices parsed into a single index

Re: 6.3 Error - Spearate indices parsed into a single index

Re: 6.3 Error - Spearate indices parsed into a single index

Re: 6.3 Error - Spearate indices parsed into a single index

Re: 6.3 Error - Spearate indices parsed into a single index

Re: 6.3 Error - Spearate indices parsed into a single index

Re: 6.3 Error - Spearate indices parsed into a single index

Re: 6.3 Error - Spearate indices parsed into a single index