Page 1 of 1

OpenCL unavailable device issue

Posted: Wed Aug 18, 2010 7:40 am
by jadelman
I was wondering if anyone has seen the following happen before, and if there is a simple solution, short of a system restart.

I recently killed a pyopenmm script that uses the OpenCL kernel prematurely using `kill <pid>`. I've done this in the past without issues, but this time, when I went to restart the program after making some small parameter changes, I got the following error:


Traceback (most recent call last):
File "teststatestability_12.py", line 70, in <module>
context = openmm.Context(system, integrator, platform)
File "/home/jadelman/python-2.6.5/lib/python2.6/site-packages/simtk/chem/openmm/openmm.py", line 1257, in __init__
this = _openmm.new_Context(*args)
Exception: Error initializing context: clCreateContextFromType (-2)

The -2 error equates to CL_DEVICE_NOT_AVAILABLE, I believe. THe machine has two C1060s and the other device was not effected, and deviceQuery showed both GPUs.

It seems like killing OpenMM might have been ungraceful and left the GPU in some sort of memory locked state. I've done this before without issue, so it is not an ongoing problem as far as I can tell. Our solution thus far has been to just restart the machine, which seems to work, but is obviously not ideal.

RE: OpenCL unavailable device issue

Posted: Wed Aug 18, 2010 9:23 am
by jadelman
Following up on my original post, after a restart of the machine, I'm getting the same error message, but from the other GPU card (the one that was giving me the error is working fine now). I have not recently updated or changed OpenMM or PyOpenMM, and the pyopenmm scripts are ones that I've used without issue previously. I'm using OpenMM revision 2340 and PyOpenMM revision 755, and select the GPU using:
platform.setPropertyDefaultValue("OpenCLDeviceIndex",str(0)) -- or str(1)

Any ideas of why this might be cropping up now after previously not having issues?

RE: OpenCL unavailable device issue

Posted: Wed Aug 18, 2010 11:48 am
by jadelman
Problem solved. The two GPUs had been switched into Exclusive compute mode, inadvertently, and changing them back to Default compute mode seems to have fixed things.

The thing that I still don't understand, is that if I select the GPU device for each independent simulation and set the devices to be different for each, why running on one GPU would lock out the other.