Page 1 of 1

Single and double precision produce different result

Posted: Tue Nov 03, 2020 12:40 pm
by luwei0917
Hello Peter,
We encounter a strange bug: the single and double precision gives different energy for our CustomHbondForce force.
In the attachment is a much simplified code, so you should be able to reproduce easily. (by doing "python simplified_code.py" to use single precision, and "python simplified_code.py -d" to use double precision.
By excepting the script I got:
weilu@weis-MacBook-Pro-2 cleaned_to_forum % python simplified_code.py
TotalEnergy 120.0 kJ/mol
weilu@weis-MacBook-Pro-2 cleaned_to_forum % python simplified_code.py -d
TotalEnergy 98.0 kJ/mol

Thanks a lot.

Re: Single and double precision produce different result

Posted: Tue Nov 03, 2020 1:13 pm
by cabb99
Dear Peter,
I just wanted to add to this discussion that the single precision version of the code get the same results as the CPU and Reference platforms. With double precision we get other errors in other computers.

(Quadro K600)

Code: Select all

$ python simplified_code.py 
TotalEnergy 120.0 kJ/mol
$ python simplified_code.py -d
OpenCL internal error: CL_OUT_OF_RESOURCES error executing CL_COMMAND_READ_BUFFER on Quadro K600 (Device 0).

Traceback (most recent call last):
  File "simplified_code.py", line 126, in <module>
    state = simulation.context.getState(getEnergy=True)
  File "/home/cab22/Programs/anaconda3/envs/py36/lib/python3.6/site-packages/simtk/openmm/openmm.py", line 18543, in getState
    state = _openmm.Context_getState(self, types, enforcePeriodicBox, groups_mask)
Exception: Error downloading array energySum: clEnqueueReadBuffer (-5)
(Quadro P1000)

Code: Select all

$ python simplified_code.py
TotalEnergy 120.0 kJ/mol
$ python simplified_code.py -d
OpenCL internal error: CL_OUT_OF_RESOURCES error executing CL_COMMAND_READ_BUFFER on Quadro P1000 (Device 0).
Traceback (most recent call last):
  File "simplified_code.py", line 126, in <module>
    state = simulation.context.getState(getEnergy=True)
  File "/home/wl52/anaconda3/lib/python3.7/site-packages/simtk/openmm/openmm.py", line 18543, in getState
    state = _openmm.Context_getState(self, types, enforcePeriodicBox, groups_mask)
Exception: Error downloading array energySum: clEnqueueReadBuffer (-5)
$ python simplified_code.py -d -p CUDA
Traceback (most recent call last):
  File "simplified_code.py", line 126, in <module>
    state = simulation.context.getState(getEnergy=True)
  File "/home/wl52/anaconda3/lib/python3.7/site-packages/simtk/openmm/openmm.py", line 18543, in getState
    state = _openmm.Context_getState(self, types, enforcePeriodicBox, groups_mask)
Exception: Error downloading array energySum: CUDA_ERROR_ILLEGAL_ADDRESS (700)
terminate called after throwing an instance of 'OpenMM::OpenMMException'
  what():  Error deleting array cmMomentum: CUDA_ERROR_ILLEGAL_ADDRESS (700)
Aborted (core dumped)

Re: Single and double precision produce different result

Posted: Wed Nov 18, 2020 11:04 am
by luwei0917
I'm using openMM 7.4.1, py36_cuda101_rc_1.

Re: Single and double precision produce different result

Posted: Wed Dec 09, 2020 1:32 pm
by cabb99
I wonder if there have been any updates on this problem.