Changing GPU kernels parameters on the fly

Taras Y · Post by **Taras Y** » Thu Sep 08, 2011 12:54 pm

Hi,

I am wondering if there is a way to change parameters of a simulation (such as integration step size, or temperature of a thermostat) on the fly, i.e. while the code is already running on a GPU. For example, how can I change the temperature used in AndersenThermostat AND keep all particles coordinates, velocities, forces parameters etc. in the global memory of the device – to immediately use with the new temperature?

Thank you a lot!

Sincerely,

Taras.

Peter Eastman · Post by **Peter Eastman** » Thu Sep 08, 2011 1:07 pm

Yes, you can do that. Here's the general principle: you can't modify any part of the System once the simulation has started, but you can modify the Integrator and the Context. So if you call setStepSize() or setTemperature() on the Integrator, that will work fine.

AndersenThermostat is a little different. It's a Force, and hence part of the System, so it cannot be modified. But if you look carefully, you'll notice that the properties it stores are called "defaultTemperature" and "defaultCollisionFrequency". Those are just the default values that get used until you change them. The actual values in use are stored as parameters in the Context, so you can change them by calling setParameter():

Code: Select all

context.setParameter(AndersenThermostat::Temperature(), newTemp);

Peter

Taras Y · Post by **Taras Y** » Thu Sep 08, 2011 1:34 pm

Dear Peter,

thank you very much for your detailed reply. I have another question: energy calculation are currently done only on the CPU, correct?

I am trying to find a way to set up a multi-GPU parallel tempering with OpenMM.

Thank you!

Sincerely,

Taras.

Peter Eastman · Post by **Peter Eastman** » Thu Sep 08, 2011 2:37 pm

No, energy calculations are done on the GPU along with the force calculations. Did you find something that claims they aren't? If so, it's badly out of date.

Peter

Taras Y · Post by **Taras Y** » Fri Sep 09, 2011 7:05 am

Hi Peter,

I think I got confused by this comment on lines 35-36 in HelloSodiumChloride.cpp, version 3.1:
// Currently energy calculation is not available in the GPU kernels so asking
// for it requires slow Reference Platform computation at reporting intervals.

I am also wondering if I can request several simulations to be concurrently executed on a chosen GPU - such that the data arrays of all simulations are kept in the global memory of the device while the simulations are running. Or this is done automatically? If I consider simulating not too big systems, global memory of the device could simultaneously accommodate data for many of them.

Thank you!

Sincerely,

Taras.

Peter Eastman · Post by **Peter Eastman** » Fri Sep 09, 2011 10:06 am

Thanks, I'll fix that. That comment hasn't been true for a long time.

If you want to run multiple simulations at once, that's fine. Just create a separate Context for each one. There are a few things to be aware of.

1. If using the CUDA platform, you can only have one Context per CPU thread. This is related to how CUDA works: it binds each GPU context to a single CPU thread, so if you want to create multiple Contexts, you need to spawn a separate thread for each one, then make sure each Context is created on a different thread and only ever accessed from that thread. The OpenCL platform doesn't have this limitation, so I'd recommend using it.

2. You need enough device memory to hold all the data for all the Contexts at once. Whether that's a problem depends on how large your system is, how many Contexts you want, and how much memory your GPU has.

3. GPUs are pretty bad at multitasking. If you try to have two different simulations actually executing at the same time, it will be much much slower than if you just executed one, then executed the other. It's fine to have two Contexts existing at once, but you should only call step() or getState() on one of them at a time, and not ask any of the others to do anything until that call has finished. If you just do everything from one thread (using the OpenCL platform, of course), this will happen automatically so you don't need to worry about it. But if using multiple threads, you need to coordinate them carefully.

Peter

Taras Y · Post by **Taras Y** » Tue Sep 13, 2011 6:42 pm

Dear Peter,

just want to say thank you so very much for your time and explanations!

Sincerely,

Taras.

Taras Y · Post by **Taras Y** » Thu Sep 22, 2011 10:57 am

Dear Peter,

I have a cpp code in which I created an array of contexts each assigned to a different GPU by using properties["OpenCLDeviceIndex"] = deviceString. When I do integration within each context # i
by calling integrator->step(numberOfSteps); in a loop over i
the code seems to wait for the end of integration of context i, and only then it launches integration in the next context i+1.
In other words, when I try to do N independent parallel simulations on N GPUs from within one cpp code, the execution time scales proportional to N.

Would you please suggest how I could make OpenMM do calculations in parallel on different GPUs from one cpp code using the “OpenCL” platform?

Similar question about the “CUDA” platform. If in the above situation I use "Cuda", the code compiles, but upon execution I get the error:
"Error setting device flags setting the device when a process is active is not allowed". Is it a CUDA error? Is it possible to request several OpenMM simulations to be concurrently executed on _different_ GPUs from within one cpp code in the "CUDA" platform?

Thank you a lot!

Sincerely,

Taras.

Taras Y · Post by **Taras Y** » Thu Sep 22, 2011 11:17 am

Dear Peter,

I think I would like to specify my question in its "OpenCL" part. I think I can request a list of all available GPUs for each context, and so upon integration multiple GPUs are scheduled on parallel by OpenCL. However, is it possible to request say GPU #0 for context #0, GPU #1 for context #1 etc, and do the calculations on parallel by controlling which context is executed where?

Thank you!

Sincerely,

Taras.

Peter Eastman · Post by **Peter Eastman** » Thu Sep 22, 2011 11:47 am

Yes, you can do that. What complicates it is that calls to execute things on the GPU may not return right away. Anything that requires retrieving information from the GPU will definitely block until all kernels involved in calculating it have finished executing. But even without that, there are lots of things that can cause the CPU to wait for the GPU. For example, it has a finite sized command queue, so once that queue fills up, any further request to perform calculations will block. So you need to assume that if you call step(1000) on the integrator, that won't return until a large fraction of those steps (possibly all of them) have been executed.

So it looks like even with OpenCL, you'll need to use multiple threads, one for each Context. Each thread can call step() on its integrator, and it won't block the other threads that are handling other GPUs.

Peter

Changing GPU kernels parameters on the fly

Changing GPU kernels parameters on the fly

Re: Changing GPU kernels parameters on the fly

Re: Changing GPU kernels parameters on the fly

Re: Changing GPU kernels parameters on the fly

Re: Changing GPU kernels parameters on the fly

Re: Changing GPU kernels parameters on the fly

Re: Changing GPU kernels parameters on the fly

Re: Changing GPU kernels parameters on the fly

Re: Changing GPU kernels parameters on the fly

Re: Changing GPU kernels parameters on the fly