array deletion error

The functionality of OpenMM will (eventually) include everything that one would need to run modern molecular simulation.
POST REPLY
User avatar
Saurabh Belsare
Posts: 32
Joined: Sat Aug 14, 2010 8:43 am

array deletion error

Post by Saurabh Belsare » Sun Dec 11, 2016 5:41 pm

Hi,

I'm running a version of OpenMM 7 compiled from source from the github version. I've been running it for a few months, without any noticeable issues. However, the current set of simulations I'm trying to run give me the following error:

Traceback (most recent call last):
File "equilibrate_npt.py", line 26, in <module>
simulation.step(writeFreq)
File "/opt/python/lib/python2.7/site-packages/simtk/openmm/app/simulation.py", line 132, in step
self._simulate(endStep=self.currentStep+steps)
File "/opt/python/lib/python2.7/site-packages/simtk/openmm/app/simulation.py", line 194, in _simulate
self.integrator.step(10) # Only take 10 steps at a time, to give Python more chances to respond to a control-c.
File "/opt/python/lib/python2.7/site-packages/simtk/openmm/openmm.py", line 3552, in step
return _openmm.LangevinIntegrator_step(self, steps)
Exception: Error downloading array diisMatrix: CUDA error (700)
Exception Exception: 'Error deleting array bondParams: CUDA error (700)' in <built-in function delete_Context> ignored
Exception Exception: 'Error deleting array langevinParams: CUDA error (700)' in <built-in function delete_LangevinIntegrator> ignored


The simulations are: protein in water system with ~30K atoms being simulated in the NPT ensemble at 300K 1atm with the Langevin Integrator and a MonteCarloBarostat with the AMOEBA force field and a 1fs timestep. The simulation is 5 ns long. I have previously run simulations with a different protein but identical simulation setup upto ~1ns, and I have not had this problem occur. However, in this set of simulations, I'm seeing job death between 1-3ns simulation time, with this above error. What might be the issue here?

Regards.

User avatar
Peter Eastman
Posts: 2544
Joined: Thu Aug 09, 2007 1:25 pm

Re: array deletion error

Post by Peter Eastman » Mon Dec 12, 2016 1:20 pm

What version of the code is this compiled from? Is it 7.0.1? Or something more recent than that?

Peter

User avatar
Saurabh Belsare
Posts: 32
Joined: Sat Aug 14, 2010 8:43 am

Re: array deletion error

Post by Saurabh Belsare » Mon Dec 12, 2016 4:18 pm

I'm not sure of the exact subversion, but we had pulled the source from github in early July.

Saurabh

User avatar
Peter Eastman
Posts: 2544
Joined: Thu Aug 09, 2007 1:25 pm

Re: array deletion error

Post by Peter Eastman » Mon Dec 12, 2016 4:20 pm

Could you try getting the latest source and see if it still happens? It might be something that's already been fixed. If not, we can investigate further.

Peter

User avatar
Saurabh Belsare
Posts: 32
Joined: Sat Aug 14, 2010 8:43 am

Re: array deletion error

Post by Saurabh Belsare » Tue Dec 20, 2016 2:03 pm

Hi Dr. Eastman,

I've tried compiling the latest source version from the github. I'm still seeing the exact same problem. Also, this problem is quite random, i.e. it occurs in some trajectories and not in others.

Saurabh

User avatar
Saurabh Belsare
Posts: 32
Joined: Sat Aug 14, 2010 8:43 am

Re: array deletion error

Post by Saurabh Belsare » Mon Jan 02, 2017 1:35 pm

Hi Dr. Eastman,

I'm still seeing this problem. I tried with different proteins, as well as water boxes of different sizes. However, I'm still seeing this error. And it is stochastic, i.e. if I restart the exact same simulation, the error occurs at a different point in the trajectory.

In addition, some of the other trajectories I have, which are very similar in sequence to the ones above (only different by <10 residues in a ~250 residue protein), are showing this error:

Traceback (most recent call last):
File "equilibrate_npt.py", line 26, in <module>
simulation.step(writeFreq)
File "/opt/python/lib/python2.7/site-packages/simtk/openmm/app/simulation.py", line 132, in step
self._simulate(endStep=self.currentStep+steps)
File "/opt/python/lib/python2.7/site-packages/simtk/openmm/app/simulation.py", line 219, in _simulate
reporter.report(self, state)
File "/opt/python/lib/python2.7/site-packages/simtk/openmm/app/pdbreporter.py", line 92, in report
PDBFile.writeModel(simulation.topology, state.getPositions(), self._out, self._nextModel)
File "/opt/python/lib/python2.7/site-packages/simtk/openmm/app/pdbfile.py", line 371, in writeModel
recordName, atomIndex%100000, atomName, resName, chainName, resId, _format_83(coords[0]),
File "/opt/python/lib/python2.7/site-packages/simtk/openmm/app/pdbfile.py", line 454, in _format_83
'in a width-8 field' % f)
ValueError: coordinate "-291498111.632" could not be represented in a width-8 field


What might the issues be?

User avatar
Peter Eastman
Posts: 2544
Joined: Thu Aug 09, 2007 1:25 pm

Re: array deletion error

Post by Peter Eastman » Fri Jan 06, 2017 2:00 pm

That one looks like your simulation has blown up and the coordinates have become enormous values. It's hard to guess exactly why. I don't suppose you can post files that would let me reproduce the error? Although if it takes hours to reproduce and isn't deterministic, that's still going to be quite a challenge to debug.

Peter

User avatar
Saurabh Belsare
Posts: 32
Joined: Sat Aug 14, 2010 8:43 am

Re: array deletion error

Post by Saurabh Belsare » Thu Jan 12, 2017 2:32 pm

Hi Dr. Eastman,

The input files can be found here
https://www.dropbox.com/sh/ytttz7ju0y8h ... 6th2a?dl=0
This simulation died with the CUDA error before the first restart file was written, i.e. <10ps. However, I'm not sure if this is deterministic and if it would die at the exact same spot again.

User avatar
Peter Eastman
Posts: 2544
Joined: Thu Aug 09, 2007 1:25 pm

Re: array deletion error

Post by Peter Eastman » Thu Jan 12, 2017 4:16 pm

I think I've fixed the problem: https://github.com/pandegroup/openmm/pull/1715. Give it a try with the latest Github revision?

Peter

User avatar
Saurabh Belsare
Posts: 32
Joined: Sat Aug 14, 2010 8:43 am

Re: array deletion error

Post by Saurabh Belsare » Wed Jan 18, 2017 12:05 pm

We downloaded and compiled the source from github yesterday and recompiled it, and I submitted two test jobs. One of them died with the same CUDA 700 error I have mentioned in my first post in this thread, and the other died with the following error:

Traceback (most recent call last):
File "equilibrate_npt.py", line 26, in <module>
simulation.step(writeFreq)
File "/opt/python/lib/python2.7/site-packages/simtk/openmm/app/simulation.py", line 132, in step
self._simulate(endStep=self.currentStep+steps)
File "/opt/python/lib/python2.7/site-packages/simtk/openmm/app/simulation.py", line 194, in _simulate
self.integrator.step(10) # Only take 10 steps at a time, to give Python more chances to respond to a control-c.
File "/opt/python/lib/python2.7/site-packages/simtk/openmm/openmm.py", line 3566, in step
return _openmm.LangevinIntegrator_step(self, steps)
Exception: The periodic box size has decreased to less than twice the nonbonded cutoff.


This seems unlikely to have actually happened, since the cutoff I'm using is 1nm while the box is ~70A on the side, and contains about ~40K atoms.

POST REPLY