Implementation of Structure-Based Model energy term in combination with Amber

The functionality of OpenMM will (eventually) include everything that one would need to run modern molecular simulation.
POST REPLY
User avatar
Tobias Sikosek
Posts: 7
Joined: Fri Jul 12, 2013 2:09 pm

Implementation of Structure-Based Model energy term in combination with Amber

Post by Tobias Sikosek » Mon Jun 20, 2016 3:21 pm

Hi everyone,

based on the success of our recent approach to combine a general force field with native-centric bias (structure based model or SBM), I would like to implement an "SBM force" to be used alongside other force fields. (For the paper, see: http://journals.plos.org/ploscompbiol/a ... bi.1004960)

The general goal is to simulate protein folding and observe multiple folding/unfolding transitions, while still taking advantage of state-of-the-art, all-atom force fields.

Our current method is implemented in a Monte Carlo simulation package (Profasi) and now I would like to test it with MD (i.e. OpenMM). My experience with both MD and OpenMM, however, is limited.

Briefly, here is what I do:

I read in a list of pairwise C-alpha atom distances obtained from a known PDB structure to use them as 'CustomBondForce' with potential energy
'epsilon * ( (1-exp( -((r-d)^2)/(2*w^2) ) )-1 )'
where epsilon is the contact energy (well depth) and w is the width of a Gaussian well centred on d (the C-alpha atom distance).

Simulations are initialized with an unfolded version of the protein. For testing I have used PDB code 2FS1 which is protein A (three-helix bundle). I have tried to test it the 'quick' way, i.e. by simply trying to fold the protein in combination with Amber and OBC force field terms and at various temperatures, epsilon, and w parameter values. So far no luck.

Before I invest more time in testing, I would like to ask the community (A) if this is a suitable way to implement the force field and (B) what would be a good strategy to test it in OpenMM and, if possible, find a working set of parameter values?

Your input is much appreciated! Thanks.

Tobias

----

Here the full Python script, see https://www.dropbox.com/s/4qfgdbg17xisr ... t.zip?dl=0 for input files.
Example usage from ipython (T=300K, epsilon=10, w=10):
run omm_sbm.py 2FS1_m1_unfolded01.pdb restraints.txt 300 10 10 OpenCL:0

Code: Select all

from __future__ import print_function
from simtk.openmm import app
import simtk.openmm as mm
from simtk import unit
from sys import stdout,argv
import openmmtools.integrators as grators
from parmed.openmm import energy_decomposition,load_topology
import mdtraj as md
import parmed

pdb = app.PDBFile(argv[1])
contactfile = argv[2]
temperature = float(argv[3])
epsilon = float(argv[4])
width = float(argv[5])
pform = argv[6]

nsteps = 200000000

doSBM = True
solvent = 'implicit'
assert solvent in ['explicit', 'implicit']

# see: Sikosek T, Krobath H, Chan HS. Theoretical Insights into the Biophysics of Protein Bi-stability and Evolutionary Switches. Jernigan RL, editor. PLOS Comput Biol. 2016;12: e1004960. doi:10.1371/journal.pcbi.1004960
function_string = 'epsilon * ( (1-exp( -((r-d)^2)/(2*w^2)  ) )-1 )'

devices = "0"

if ":" in pform:
	tmp = pform.strip().split(":")
	pform = tmp[0]
	devices = tmp[1]

pH = 7.0

print('Setting up topology...')
if solvent == 'explicit':
	forcefield = app.ForceField('amber99sbildn.xml')
else:
	forcefield = app.ForceField('amber99sbildn.xml', 'amber99_obc.xml')

modeller = app.Modeller(pdb.topology, pdb.positions)
modeller.addHydrogens(forcefield,pH=pH)

if solvent == 'explicit':
	modeller.addSolvent(forcefield, padding=1.0*unit.nanometers, ionicStrength=0.1*unit.molar)
else:
	pdb.topology.setPeriodicBoxVectors([[30,0,0], [0,30,0], [0,0,30]])

print('Creating system...')

system = forcefield.createSystem(pdb.topology, nonbondedMethod=app.CutoffPeriodic, 
    nonbondedCutoff=1.0*unit.nanometers, constraints=app.AllBonds, rigidWater=True,hydrogenMass=4*unit.amu)

if doSBM:
	print('SBM force')
	sbm_force = mm.CustomBondForce( function_string )
	sbm_force.addPerBondParameter('epsilon')
	sbm_force.addPerBondParameter('d')
	sbm_force.addPerBondParameter('w')
	
	
	# collect internal atom names and ids to verify input restraints
	tp = pdb.getTopology()
	atid = []
	atnm= []
	for i in tp.atoms():
		atid.append(int(i.id))
		atnm.append(i.name)

	with open(contactfile) as input_file:
		print ("looking for SBM restraints...")
		for line in input_file:
			columns = line.split()
			atom_index_i = int(columns[0])
			atom_index_j = int(columns[1])
			#print('%i %s %s'%(atom_index_i, atid[atom_index_i-1], atnm[atom_index_i-1]))
			#print('	%i %s %s'%(atom_index_j, atid[atom_index_j-1], atnm[atom_index_j-1]))
			assert atom_index_i == atid[atom_index_i-1], str((atom_index_i, atid[atom_index_i-1]))
			assert atnm[atom_index_i-1] == 'CA'
		
			assert atom_index_j == atid[atom_index_j-1]
			assert atnm[atom_index_j-1] == 'CA'
		
			epsilon = epsilon
			d = float(columns[3])
			w = width
			
			sbm_force.addBond(atom_index_i, atom_index_j, [epsilon,d,w])
			
			print ("%i %i %s"%(atom_index_i, atom_index_j, str([epsilon,d,w])))
	
	print ( "%i custom bonds"%(sbm_force.getNumBonds() ))
	
	forcefield.registerGenerator(mm.CustomBondForce)
	
	system.addForce(sbm_force)
	
	for i in xrange(system.getNumForces()):
		fi = system.getForce(i)
		print (type(fi), fi.getForceGroup())
	print (system.getNumForces(),"energy terms")


if solvent == 'explicit':
	system.addForce(mm.AndersenThermostat(temperature*unit.kelvin, 1/unit.picosecond))
	integrator = mm.VerletIntegrator(5.0*unit.femtoseconds)
else:
	integrator = mm.LangevinIntegrator(temperature*unit.kelvin, 1.0/unit.picoseconds, 5.0*unit.femtoseconds)
integrator.setConstraintTolerance(0.00001)

platform = mm.Platform.getPlatformByName(pform)

simulation = app.Simulation(pdb.topology, system, integrator, platform)

if pform=="OpenCL":
	platform.setPropertyValue(simulation.context, "OpenCLDeviceIndex", devices)
elif pform == "CUDA":
	platform.setPropertyValue(simulation.context, "CUDADeviceIndex", devices)	
	
simulation.context.setPositions(pdb.positions)

print('Minimizing...')
simulation.minimizeEnergy()

simulation.context.setVelocitiesToTemperature(temperature*unit.kelvin)
print('Equilibrating...')
simulation.step(1000)

simulation.reporters.append(app.PDBReporter('trajectory.pdb', 5000))
simulation.reporters.append(app.StateDataReporter(stdout, 10000, step=True, 
    potentialEnergy=True, kineticEnergy=True, temperature=True, volume=True, density=True, 
    progress=True, remainingTime=True, speed=True, totalSteps=20000000, 
    separator='\t'))
simulation.reporters.append(app.CheckpointReporter('checkpnt.chk', 10000))

print('Running Production...')

simulation.step(nsteps)

print('Done!')

User avatar
Peter Eastman
Posts: 2541
Joined: Thu Aug 09, 2007 1:25 pm

Re: Implementation of Structure-Based Model energy term in combination with Amber

Post by Peter Eastman » Tue Jun 21, 2016 10:49 am

I can't comment too much on the method, but the way you've implemented it looks reasonable. The challenge will be to bias it toward the native structure without creating barriers. That will require optimizing the parameters. My guess is that you'll want the gaussians to be fairly wide and shallow, so they just provide a gentle push in the right direction. Then again, if you make them narrow but deep it will function a lot like a Go model: they'll have no effect until two correct atoms happen to come together, and then they'll stick. So I'm really not sure.

You can remove this line from your script:

Code: Select all

   forcefield.registerGenerator(mm.CustomBondForce)
CustomBondForce is a force class, not a generator.

Peter

User avatar
Tobias Sikosek
Posts: 7
Joined: Fri Jul 12, 2013 2:09 pm

Re: Implementation of Structure-Based Model energy term in combination with Amber

Post by Tobias Sikosek » Tue Jun 21, 2016 11:16 am

Thanks for having a look at the code! So it seems that I might be on the right track. I'll continue to play with the parameters. At least I'm getting excellent performance on our dual GTX 680 machine with CUDA and implicit solvent. >1000 ns/day

Are you aware of any plugin that would allow me to track observables related to the SBM term during runtime, such as the potential energy of the SBM or atomic distances?

User avatar
Peter Eastman
Posts: 2541
Joined: Thu Aug 09, 2007 1:25 pm

Re: Implementation of Structure-Based Model energy term in combination with Amber

Post by Peter Eastman » Tue Jun 21, 2016 12:04 pm

To query the SBM potential energy, put the CustomBondForce into its own force group:

Code: Select all

sbm_force.setForceGroup(1)
Then you can query the energy of just that group:

Code: Select all

energy = simulation.context.getState(getEnergy=True, groups={1}).getPotentialEnergy()
Likewise if you want to know atom distances, query the atom positions and calculate them:

Code: Select all

positions = simulation.context.getState(getPositions=True).getPositions()
distance01 = unit.norm(positions[0]-positions[1])
Peter

User avatar
Tobias Sikosek
Posts: 7
Joined: Fri Jul 12, 2013 2:09 pm

Re: Implementation of Structure-Based Model energy term in combination with Amber

Post by Tobias Sikosek » Tue Jun 21, 2016 2:17 pm

Thanks! That helps a lot.

Should I expect 1 kJ/mol for each 'bond' formed, i.e. when C-alpha atoms are at native distance, when epsilon=1 ?

User avatar
Peter Eastman
Posts: 2541
Joined: Thu Aug 09, 2007 1:25 pm

Re: Implementation of Structure-Based Model energy term in combination with Amber

Post by Peter Eastman » Tue Jun 21, 2016 2:53 pm

Correct.

Peter

User avatar
Tobias Sikosek
Posts: 7
Joined: Fri Jul 12, 2013 2:09 pm

Re: Implementation of Structure-Based Model energy term in combination with Amber

Post by Tobias Sikosek » Wed Jul 06, 2016 1:06 pm

Hi Peter,

I have a follow-up question to this thread.

I observe that my simulations gradually drop in ns/day, starting at around 1000 and then dropping to below 40 at the end of a 100 ns run. This is with CUDA running on two GTX 680. Any idea what that could mean?

Thanks,
Tobias

User avatar
Peter Eastman
Posts: 2541
Joined: Thu Aug 09, 2007 1:25 pm

Re: Implementation of Structure-Based Model energy term in combination with Amber

Post by Peter Eastman » Wed Jul 06, 2016 1:24 pm

I can think of a couple of possibilities. First, I see from the script you posted earlier that you're using ParmEd. By any chance are you using its NetCDFReporter? That has a known memory leak which causes performance to gradually go down with time. I'd recommend using a different reporter like DCDReporter instead.

The other possibility is that a large conformational change is happening in your system that causes the neighbor list to become less efficient. This can happen in the CUDA and OpenCL platforms because of the particular way they group atoms together for calculating interactions. Performance tends to be somewhat better for folded proteins than unfolded ones. However, this should be more like a 2x difference, not a 25x difference. So the memory leak seems much more likely.

If you were seeing a sudden drop in performance instead of a gradual one, I would suspect the simulation had blown up and all the coordinates had become nan. But that doesn't fit the behavior you described.

Peter

POST REPLY