Setting "OpenCLDeviceIndex" using the python API

The functionality of OpenMM will (eventually) include everything that one would need to run modern molecular simulation.
POST REPLY
User avatar
Benjamin Trendelkamp-Schroer
Posts: 12
Joined: Fri Feb 24, 2012 11:49 am

Setting "OpenCLDeviceIndex" using the python API

Post by Benjamin Trendelkamp-Schroer » Thu Jan 17, 2013 8:36 am

Hello,

I am running openMM on a machine with 7 GeForceGTX580 GPUs. I want to select a specific GPU-device for my simulation.

simulation "is an instance of" simtk.openmm.openmm.Simulation

I do

Code: Select all

simulation.context.getPlatform().setPropertValue(simulation.context,"OpenCLDeviceIndex", "0,1,2")
print simulation.context.getPlatform().setPropertValue(simulation.context,"OpenCLDeviceIndex")
but upon execution it prints only device-id zero.

It seem like setPropertyValue has no effect at all. What am I doing wrong here or what could be the cause of this?

Thanks,

Ben

User avatar
Peter Eastman
Posts: 2568
Joined: Thu Aug 09, 2007 1:25 pm

Re: Setting "OpenCLDeviceIndex" using the python API

Post by Peter Eastman » Thu Jan 17, 2013 11:05 am

Hi Ben,

The device gets fixed when the Context is created, so if you change the property after that it has no effect.

Try creating your Simulation like this:

Code: Select all

platform = Platform.getPlatformByName("OpenCL")
properties = {"OpenCLDeviceIndex":"0,1,2"}
simulation = Simulation(topology, system, integrator, platform, properties)
Peter

User avatar
Benjamin Trendelkamp-Schroer
Posts: 12
Joined: Fri Feb 24, 2012 11:49 am

Re: Setting "OpenCLDeviceIndex" using the python API

Post by Benjamin Trendelkamp-Schroer » Fri Jan 18, 2013 6:05 am

Hi Peter,

thank you very much for your helpful reply.

Is there a bound on the number of simulation objects that I can assign to the same device_id?

How is concurrent execution on different devices iplemented?

Aussume I have assigned n simulations to n different devices.

Does

Code: Select all

for device_id in [0,...,n]
   simulation[device_id].step(N)
execute in parallel on all devices or does is execute in serial (but still on different devices)?

If it does not execute in parallel, how do I achieve parallel execution?

If it executes in parallel how do I synchronize?

Please excuse me, if I have overlooked something in the documentation clarifying these issues.

Thanks,

Ben

User avatar
Peter Eastman
Posts: 2568
Joined: Thu Aug 09, 2007 1:25 pm

Re: Setting "OpenCLDeviceIndex" using the python API

Post by Peter Eastman » Fri Jan 18, 2013 11:52 am

Hi Ben,

There are a few issues with having multiple simulations on one device. First, GPUs are very bad at multitasking, so you don't want to actually have two of them executing at the same time. If you do, your performance will go way down.

It's fine to have multiple simulations existing at once, as long as only one is doing any calculations at a time. They have to share various resources, though, such as GPU memory, driver workspace memory, etc., which will limit how many you can create at once.

Don't make any assumptions about how synchronous or asynchronous step() will be. This depends on various factors, some of which are up to the driver. It's possible that step() will block for much of the calculation, but return before it's completely done. So if you want to run several different simulations (on different devices) at the same time, it's best to create a thread or subprocess for each one.

Important note if you're working in Python: OpenMM 4.1.1 does not release the Global Interpreter Lock when you call step(), so it will block all other threads. That means you really need to use processes instead (which is pretty easy with the subprocess module). OpenMM 5.0 does release it, so threads should work much better.

Peter

User avatar
Benjamin Trendelkamp-Schroer
Posts: 12
Joined: Fri Feb 24, 2012 11:49 am

Re: Setting "OpenCLDeviceIndex" using the python API

Post by Benjamin Trendelkamp-Schroer » Mon Jan 21, 2013 8:30 am

Hi Peter,

thanks a lot for the very helpful explanation.

There seems to be an upper bound on how many simulation object openMM can allocate for the OpenCL platform. Once I try to create more than 19 simulation objects the python interpreter exists with the following error message

Code: Select all

Traceback (most recent call last):
  File "remd_multiprocess.py", line 79, in <module>
    simulation=Simulation(pdb.topology, system, integrator, platform, properties)
  File "/home/trendelkamp/.local/lib/python2.7/site-packages/simtk/openmm/app/simulation.py", line 79, in __init__
    self.context = mm.Context(system, integrator, platform, platformProperties)
  File "/home/trendelkamp/.local/lib/python2.7/site-packages/simtk/openmm/openmm.py", line 4594, in __init__
    this = _openmm.new_Context(*args)
Exception: Error compiling kernel: 
The behaviour for the CUDA platform is different. I can assign only one Simulation object to a device. If I try to assign more than one simulation to a single device I get the following error message.

Code: Select all

Traceback (most recent call last):
  File "remd_multiprocess.py", line 79, in <module>
    simulation=Simulation(pdb.topology, system, integrator, platform, properties)
  File "/home/trendelkamp/.local/lib/python2.7/site-packages/simtk/openmm/app/simulation.py", line 79, in __init__
    self.context = mm.Context(system, integrator, platform, platformProperties)
  File "/home/trendelkamp/.local/lib/python2.7/site-packages/simtk/openmm/openmm.py", line 4594, in __init__
    this = _openmm.new_Context(*args)
Exception: Error setting device flags cannot set while device is active in this process
I want to perform replica exchange MD on N_DEVICE GPUs. Each device integrating at a single temperature. Most likely there will be more temperatures than available devices, so that I have to split up the work into blocks of N_DEVICE simulations, with parallel execution within blocks.

This is why I wanted to assign more than one simulation to a single device, but maybe this is not the preferred strategy.

Thanks,

Ben

User avatar
Benjamin Trendelkamp-Schroer
Posts: 12
Joined: Fri Feb 24, 2012 11:49 am

Re: Setting "OpenCLDeviceIndex" using the python API

Post by Benjamin Trendelkamp-Schroer » Mon Jan 21, 2013 8:34 am

Note: Strangely, the error for the OpenCL platform occurs when assigning to a single device as well as when assigning to multiple devices.

User avatar
Peter Eastman
Posts: 2568
Joined: Thu Aug 09, 2007 1:25 pm

Re: Setting "OpenCLDeviceIndex" using the python API

Post by Peter Eastman » Mon Jan 21, 2013 11:13 am

19 simulations on one device sounds reasonable for the upper limit. You'll only get up to even that many if each of them is a pretty small system. Otherwise, you'll run out of device memory before that.

The CUDA platform in 4.1.1 only allows one simulation per thread at any time, so if you want more than one you need to use multiple threads. In 5.0 it won't have that problem.

Peter

User avatar
Benjamin Trendelkamp-Schroer
Posts: 12
Joined: Fri Feb 24, 2012 11:49 am

Re: Setting "OpenCLDeviceIndex" using the python API

Post by Benjamin Trendelkamp-Schroer » Tue Jan 22, 2013 12:43 am

Hi Peter,

thanks a lot for your elpful suggestions.

Ben

User avatar
Benjamin Trendelkamp-Schroer
Posts: 12
Joined: Fri Feb 24, 2012 11:49 am

Re: Setting "OpenCLDeviceIndex" using the python API

Post by Benjamin Trendelkamp-Schroer » Wed Jan 30, 2013 7:05 am

As a follow up to my previous questions I noticed a very strange behaviour, when trying to use OpenMM4.1.1 on multiple GPUs.

For platform OpenCL the constructor of Simulation seems to allocate memory on each of my devices. In addition this bahviour is not affected by setting the "OpenCLDeviceId" property.

For the cuda platform I can allocate one simulation per device only.

Code: Select all

N_MAX_DEVICES
simulations = []
for i in range(N_MAX_DEVICES):
   device_id=i%N_MAX_DEVICES   
   properties ={"CudaDevice":str(device_id)}
   integrator=LangevinIntegrator(temperatures[i], collision_rate, timestep)
   simulation=Simulation(pdb.topology, system, integrator, platform, properties)
   simulations.append(simulation)
But a subsequent

Code: Select all

for s in simulation:
   s.step(n)
gives the following error.

Code: Select all

Traceback (most recent call last):
  File "remd_multiprocess_old.py", line 76, in <module>
    simulation.step(10)
  File "/home/trendelkamp/.local/lib/python2.7/site-packages/simtk/openmm/app/simulation.py", line 107, in step
    self.integrator.step(stepsToGo)
  File "/home/trendelkamp/.local/lib/python2.7/site-packages/simtk/openmm/openmm.py", line 9364, in step
    return _openmm.LangevinIntegrator_step(self, *args)
Exception: Error: unknown error launching kernel kCalculateCDLJEwaldForces
Identical code does work if I am running on OpenCL.

A working attempt that was able to fix this is to have a seperate process for each device. Each process needs to:
i) import its own openmm modules
ii) Construct a full set of objects required to instantiate Simulation()
iii) Perform step()

Maybe the import of the openmm module generates global variables than can not be shared between Simulation objects on different devices.

Thanks,

Ben

User avatar
Peter Eastman
Posts: 2568
Joined: Thu Aug 09, 2007 1:25 pm

Re: Setting "OpenCLDeviceIndex" using the python API

Post by Peter Eastman » Wed Jan 30, 2013 6:08 pm

Hi Ben,

Both of these problems are fixed in OpenMM 5.0. Note that OpenCL isn't really using memory on all the devices, though it may look like it when viewed with certain tools. It creates a context that spans all devices, then allocates memory within that context. This makes it appear that memory is being reserved on all the devices, but physical memory doesn't actually get allocated until the first time you try to access that memory on a given device - which is never for all but one of them.

Peter

POST REPLY