Page 1 of 1

array deletion error

Posted: Sun Dec 11, 2016 5:41 pm
by saurabhbelsare
Hi,

I'm running a version of OpenMM 7 compiled from source from the github version. I've been running it for a few months, without any noticeable issues. However, the current set of simulations I'm trying to run give me the following error:

Traceback (most recent call last):
File "equilibrate_npt.py", line 26, in <module>
simulation.step(writeFreq)
File "/opt/python/lib/python2.7/site-packages/simtk/openmm/app/simulation.py", line 132, in step
self._simulate(endStep=self.currentStep+steps)
File "/opt/python/lib/python2.7/site-packages/simtk/openmm/app/simulation.py", line 194, in _simulate
self.integrator.step(10) # Only take 10 steps at a time, to give Python more chances to respond to a control-c.
File "/opt/python/lib/python2.7/site-packages/simtk/openmm/openmm.py", line 3552, in step
return _openmm.LangevinIntegrator_step(self, steps)
Exception: Error downloading array diisMatrix: CUDA error (700)
Exception Exception: 'Error deleting array bondParams: CUDA error (700)' in <built-in function delete_Context> ignored
Exception Exception: 'Error deleting array langevinParams: CUDA error (700)' in <built-in function delete_LangevinIntegrator> ignored


The simulations are: protein in water system with ~30K atoms being simulated in the NPT ensemble at 300K 1atm with the Langevin Integrator and a MonteCarloBarostat with the AMOEBA force field and a 1fs timestep. The simulation is 5 ns long. I have previously run simulations with a different protein but identical simulation setup upto ~1ns, and I have not had this problem occur. However, in this set of simulations, I'm seeing job death between 1-3ns simulation time, with this above error. What might be the issue here?

Regards.

Re: array deletion error

Posted: Mon Dec 12, 2016 1:20 pm
by peastman
What version of the code is this compiled from? Is it 7.0.1? Or something more recent than that?

Peter

Re: array deletion error

Posted: Mon Dec 12, 2016 4:18 pm
by saurabhbelsare
I'm not sure of the exact subversion, but we had pulled the source from github in early July.

Saurabh

Re: array deletion error

Posted: Mon Dec 12, 2016 4:20 pm
by peastman
Could you try getting the latest source and see if it still happens? It might be something that's already been fixed. If not, we can investigate further.

Peter

Re: array deletion error

Posted: Tue Dec 20, 2016 2:03 pm
by saurabhbelsare
Hi Dr. Eastman,

I've tried compiling the latest source version from the github. I'm still seeing the exact same problem. Also, this problem is quite random, i.e. it occurs in some trajectories and not in others.

Saurabh

Re: array deletion error

Posted: Mon Jan 02, 2017 1:35 pm
by saurabhbelsare
Hi Dr. Eastman,

I'm still seeing this problem. I tried with different proteins, as well as water boxes of different sizes. However, I'm still seeing this error. And it is stochastic, i.e. if I restart the exact same simulation, the error occurs at a different point in the trajectory.

In addition, some of the other trajectories I have, which are very similar in sequence to the ones above (only different by <10 residues in a ~250 residue protein), are showing this error:

Traceback (most recent call last):
File "equilibrate_npt.py", line 26, in <module>
simulation.step(writeFreq)
File "/opt/python/lib/python2.7/site-packages/simtk/openmm/app/simulation.py", line 132, in step
self._simulate(endStep=self.currentStep+steps)
File "/opt/python/lib/python2.7/site-packages/simtk/openmm/app/simulation.py", line 219, in _simulate
reporter.report(self, state)
File "/opt/python/lib/python2.7/site-packages/simtk/openmm/app/pdbreporter.py", line 92, in report
PDBFile.writeModel(simulation.topology, state.getPositions(), self._out, self._nextModel)
File "/opt/python/lib/python2.7/site-packages/simtk/openmm/app/pdbfile.py", line 371, in writeModel
recordName, atomIndex%100000, atomName, resName, chainName, resId, _format_83(coords[0]),
File "/opt/python/lib/python2.7/site-packages/simtk/openmm/app/pdbfile.py", line 454, in _format_83
'in a width-8 field' % f)
ValueError: coordinate "-291498111.632" could not be represented in a width-8 field


What might the issues be?

Re: array deletion error

Posted: Fri Jan 06, 2017 2:00 pm
by peastman
That one looks like your simulation has blown up and the coordinates have become enormous values. It's hard to guess exactly why. I don't suppose you can post files that would let me reproduce the error? Although if it takes hours to reproduce and isn't deterministic, that's still going to be quite a challenge to debug.

Peter

Re: array deletion error

Posted: Thu Jan 12, 2017 2:32 pm
by saurabhbelsare
Hi Dr. Eastman,

The input files can be found here
https://www.dropbox.com/sh/ytttz7ju0y8h ... 6th2a?dl=0
This simulation died with the CUDA error before the first restart file was written, i.e. <10ps. However, I'm not sure if this is deterministic and if it would die at the exact same spot again.

Re: array deletion error

Posted: Thu Jan 12, 2017 4:16 pm
by peastman
I think I've fixed the problem: https://github.com/pandegroup/openmm/pull/1715. Give it a try with the latest Github revision?

Peter

Re: array deletion error

Posted: Wed Jan 18, 2017 12:05 pm
by saurabhbelsare
We downloaded and compiled the source from github yesterday and recompiled it, and I submitted two test jobs. One of them died with the same CUDA 700 error I have mentioned in my first post in this thread, and the other died with the following error:

Traceback (most recent call last):
File "equilibrate_npt.py", line 26, in <module>
simulation.step(writeFreq)
File "/opt/python/lib/python2.7/site-packages/simtk/openmm/app/simulation.py", line 132, in step
self._simulate(endStep=self.currentStep+steps)
File "/opt/python/lib/python2.7/site-packages/simtk/openmm/app/simulation.py", line 194, in _simulate
self.integrator.step(10) # Only take 10 steps at a time, to give Python more chances to respond to a control-c.
File "/opt/python/lib/python2.7/site-packages/simtk/openmm/openmm.py", line 3566, in step
return _openmm.LangevinIntegrator_step(self, steps)
Exception: The periodic box size has decreased to less than twice the nonbonded cutoff.


This seems unlikely to have actually happened, since the cutoff I'm using is 1nm while the box is ~70A on the side, and contains about ~40K atoms.