error with AMOEBA simulation
Posted: Tue Mar 07, 2023 8:04 am
Hi OpenMM,
I am trying to run an AMOEBA simulation of a rather large system (81,000 atoms or so). I minimized and equilibrated the solvent just fine, then did more minimizations and that was fine too. But during the temperature annealing / heating step it first of all runs WAY too long for the amount of integration time and then four days in it crashes with the following error: openmm.OpenMMException: Error invoking kernel: CUDA_ERROR_ILLEGAL_ADDRESS (700). The whole traceback is below. I'm not sure what to do. Is the system just too big for the GPU memory perhaps with this force field?
Traceback (most recent call last):
File "/home/dk758/project/myTools/MD/eq_md_amoeba_full.py", line 384, in <module>
heat("min.pdb", "heat.pdb")
File "/home/dk758/project/myTools/MD/eq_md_amoeba_full.py", line 265, in heat
sim.step(1)
File "/gpfs/gibbs/project/hammes_schiffer/dk758/conda_envs/sfg/lib/python3.9/site-packages/openmm/app/simulation.py", line 134, in step
self._simulate(endStep=self.currentStep+steps)
File "/gpfs/gibbs/project/hammes_schiffer/dk758/conda_envs/sfg/lib/python3.9/site-packages/openmm/app/simulation.py", line 204, in _simulate
self.integrator.step(stepsToGo)
File "/gpfs/gibbs/project/hammes_schiffer/dk758/conda_envs/sfg/lib/python3.9/site-packages/openmm/openmm.py", line 8405, in step
return _openmm.LangevinIntegrator_step(self, steps)
openmm.OpenMMException: Error invoking kernel: CUDA_ERROR_ILLEGAL_ADDRESS (700)
terminate called after throwing an instance of 'OpenMM::OpenMMException'
what(): Error deleting array param1: CUDA_ERROR_ILLEGAL_ADDRESS (700)
/var/spool/slurmd/job16345443/slurm_script: line 24: 31261 Aborted ${SCRIPT} Ab_1.pdb /home/dk758/palmer_scratch/Ab_1_amoeba_10ns_dt3fs.ncdf 10000
I am trying to run an AMOEBA simulation of a rather large system (81,000 atoms or so). I minimized and equilibrated the solvent just fine, then did more minimizations and that was fine too. But during the temperature annealing / heating step it first of all runs WAY too long for the amount of integration time and then four days in it crashes with the following error: openmm.OpenMMException: Error invoking kernel: CUDA_ERROR_ILLEGAL_ADDRESS (700). The whole traceback is below. I'm not sure what to do. Is the system just too big for the GPU memory perhaps with this force field?
Traceback (most recent call last):
File "/home/dk758/project/myTools/MD/eq_md_amoeba_full.py", line 384, in <module>
heat("min.pdb", "heat.pdb")
File "/home/dk758/project/myTools/MD/eq_md_amoeba_full.py", line 265, in heat
sim.step(1)
File "/gpfs/gibbs/project/hammes_schiffer/dk758/conda_envs/sfg/lib/python3.9/site-packages/openmm/app/simulation.py", line 134, in step
self._simulate(endStep=self.currentStep+steps)
File "/gpfs/gibbs/project/hammes_schiffer/dk758/conda_envs/sfg/lib/python3.9/site-packages/openmm/app/simulation.py", line 204, in _simulate
self.integrator.step(stepsToGo)
File "/gpfs/gibbs/project/hammes_schiffer/dk758/conda_envs/sfg/lib/python3.9/site-packages/openmm/openmm.py", line 8405, in step
return _openmm.LangevinIntegrator_step(self, steps)
openmm.OpenMMException: Error invoking kernel: CUDA_ERROR_ILLEGAL_ADDRESS (700)
terminate called after throwing an instance of 'OpenMM::OpenMMException'
what(): Error deleting array param1: CUDA_ERROR_ILLEGAL_ADDRESS (700)
/var/spool/slurmd/job16345443/slurm_script: line 24: 31261 Aborted ${SCRIPT} Ab_1.pdb /home/dk758/palmer_scratch/Ab_1_amoeba_10ns_dt3fs.ncdf 10000