Problems with CustomCompoundBondForce on CUDA platform

Charles Brooks · Post by **Charles Brooks** » Sun Feb 23, 2014 11:13 am

Dear All,

In implementing more of CHARMM's functionality through the CHARMM/OpenMM interface we have recently run into problems with some code that was written to provide the CHARMM resd restraints through CHARMM/OpenMM. The implementation uses the CustomCompoundBondForce and throws errors by producing NANs as the energy output for a test system on CUDA, though it works and give the correct answers for OpenCL and Reference platforms. We have recently implemented the XmlSerializer into the CHARMM/OpenMM interface, and this provides a convenient way for us to export these examples to illustrate the failure through the OpenMM python API. In the dropbox file https://dl.dropboxusercontent.com/u/158 ... -23-14.tgz the files necessary to run this example through the OpenMM python API can be found (this is because there is no space to append the file to this message as an attachment since the a message saying "attachment quota has been reached). The failure is clearly illustrated. Any advice on what the problem might be would be appreciated.

I note that I was testing under OpenMM5.2. It appears that the treatment of integrators and contexts are slightly different under the 6.0 version I just downloaded and the following code is required:

Code: Select all

from simtk.openmm.app import *
from simtk.openmm import *
from simtk.unit import *
from sys import stdout

plats = ['OpenCL', 'CUDA', 'Reference']

pdb = PDBFile('omm_resdtest_3.pdb')
f = open("omm_resdtest_3.xml", "r").read()
system = XmlSerializer.deserialize(f)

print "************TEST 3******************"
for w in plats:
    pltfrm = Platform.getPlatformByName(w)
    integrator = LangevinIntegrator(300*kelvin, 1/picosecond, 0.002*picoseconds)
    context = Context(system,integrator,pltfrm)
    context.setPositions(pdb.positions)
    state = context.getState(getEnergy=True)
    potential = state.getPotentialEnergy()
    
    platform = Platform.getName(context.getPlatform())
    print platform, potential
    
    del context, platform, pltfrm, integrator
    
del system, f, pdb

pdb = PDBFile('omm_resdtest_4a.pdb')
f = open("omm_resdtest_4a.xml", "r").read()
system = XmlSerializer.deserialize(f)

print "************TEST 4a******************"
for w in plats:
    pltfrm = Platform.getPlatformByName(w)
    integrator = LangevinIntegrator(300*kelvin, 1/picosecond, 0.002*picoseconds)
    context = Context(system,integrator,pltfrm)
    context.setPositions(pdb.positions)
    state = context.getState(getEnergy=True)
    potential = state.getPotentialEnergy()
    
    platform = Platform.getName(context.getPlatform())
    print platform, potential
    
    del context, platform, pltfrm, integrator
    
del system, f, pdb

pdb = PDBFile('omm_resdtest_4b.pdb')
f = open("omm_resdtest_4b.xml", "r").read()
system = XmlSerializer.deserialize(f)

print "************TEST 4b******************"
for w in plats:
    pltfrm = Platform.getPlatformByName(w)
    integrator = LangevinIntegrator(300*kelvin, 1/picosecond, 0.002*picoseconds)
    context = Context(system,integrator,pltfrm)
    context.setPositions(pdb.positions)
    state = context.getState(getEnergy=True)
    potential = state.getPotentialEnergy()
    
    platform = Platform.getName(context.getPlatform())
    print platform, potential
    
    del context, platform, pltfrm, integrator

Thanks,

Charles Brooks

Peter Eastman · Post by **Peter Eastman** » Mon Feb 24, 2014 12:41 pm

It looks to me like this is a CUDA bug, probably caused by it making some invalid optimizations.

I focused on your omm_resdtest_3 files. Of the four CustomCompoundBondForces in that system, the first three are all producing nan for the energy. So I looked at the first one, which is easy since it only contains one bond. Here's the energy expression:

Code: Select all

sval*kval/eval*(del)^eval;del=k1*distance(p1,p2)+k2*distance(p3,p4)-rval;

In this case, del=-0.0250377 and eval=2. But when it evaluates pow(del, eval), the result is coming out as nan. That's incorrect, since raising a negative value to an integer power is completely reasonable. But it's getting very close to an edge case, since raising a negative value to any non-integer power should, in fact, produce nan. It looks like CUDA is making a mistake in there somewhere.

Is it actually necessary for the exponent to be a parameter? If I hardcode 2 the problem goes away. In that case, OpenMM optimizes the expression and replaces the pow() function with a multiplication. Alternatively, if I replace del with abs(del) it also comes out correct. Is eval always a positive power?

Peter

Charles Brooks · Post by **Charles Brooks** » Tue Feb 25, 2014 6:53 am

Dear Peter,

Thanks for the response. No, the eval doesn't have to be a parameter, but then its a bit messier because I need to detect and make different functions for different values of eval, which is a parameter that the user can set. I was trying to minimize the code that is necessary, but can try this.

Charlie

P.S. The Xmlserializer functionality is great as it let's me take the CHARMM code and any system we can code in CHARMM to OpenMM for separate testing.

Problems with CustomCompoundBondForce on CUDA platform

Problems with CustomCompoundBondForce on CUDA platform

Re: Problems with CustomCompoundBondForce on CUDA platform

Re: Problems with CustomCompoundBondForce on CUDA platform