CL_OUT_OF_RESOURCES with RPMD Integrator

Nabil Faruk · Post by **Nabil Faruk** » Mon Jan 21, 2013 3:43 pm

Hi, I have a couple of questions (OpenMM 4.1.1):
1. When I specify more than 450 copies for RPMD integrator (no matter the number of atoms), I get the following error during integration with the OpenCL platform:

Code: Select all

OpenCL internal error: CL_OUT_OF_RESOURCES error executing CL_COMMAND_NDRANGE_KERNEL on Tesla M2070 (Device 0)

From briefly reading about it seems like this is due to a limit on the workgroup size?

2. It seems that the copies are handled sequentially instead of in parallel because with a small system of only a few atoms the reference platform is faster than OpenCL, even with 100's of copies. Only if the number of atoms are increased does the OpenCL platform win out. In OpenCLRpmdKernels.cpp (OpenMM 4.1.1) starting from line 125 I see:

Code: Select all

// Loop over copies and compute the force on each one.

copyToContextKernel.setArg<cl::Buffer>(0, positions->getDeviceBuffer());
copyToContextKernel.setArg<cl::Buffer>(1, cl.getPosq().getDeviceBuffer());
copyToContextKernel.setArg<cl::Buffer>(2, cl.getAtomIndex().getDeviceBuffer());
copyFromContextKernel.setArg<cl::Buffer>(0, cl.getForce().getDeviceBuffer());
copyFromContextKernel.setArg<cl::Buffer>(1, forces->getDeviceBuffer());
copyFromContextKernel.setArg<cl::Buffer>(2, cl.getAtomIndex().getDeviceBuffer());
if (!forcesAreValid)
computeForces(context);

I don't understand the OpenCL (I will start reading up on it), but the comment seems to indicate that the copies are handled one at a time. Was this done to prevent memory issues with large systems?

Thanks,

Peter Eastman · Post by **Peter Eastman** » Thu Jan 24, 2013 3:42 pm

From briefly reading about it seems like this is due to a limit on the workgroup size?

Yes, I believe you're correct. Is that limit a problem for you? When I was developing this feature, I talked to people who do RPMD simulations and was told that about 32 is the largest number of copies that anyone ever uses, and even that is probably more than they really need. What are you doing that requires so many copies?

It seems that the copies are handled sequentially instead of in parallel

The word "handles" means a lot of things here. It calculates the atomic forces for the copies one at a time (but that force calculation, like all force calculations, is parallelized across the entire GPU). The thermostat and integration algorithms are them applied to all copies at once. In fact, the algorithm involves taking a fourier transform of the data for a single atom across all copies.

Peter

Nabil Faruk · Post by **Nabil Faruk** » Tue Jan 29, 2013 10:30 am

Thanks for the response. For the limitations on the number of copies: we are interested in accelerating PIMD/RPMD of simulations of low temperature molecular systems using OpenMM. At low temperature the number of beads (fourier modes) can get very large. This number is not known a priori and so a convergence study is required. Here is a link to a study by one of our former group members on low temperature doped helium clusters and he needed up to 900 copies for his convergence studies (he was using our PIMD implementation of MMTK).

Peter Eastman · Post by **Peter Eastman** » Tue Jan 29, 2013 11:43 am

Thanks, that's good to know. I've entered this into the tracker (https://simtk.org/tracker/index.php?fun ... 1&atid=436), so hopefully we can fix it.

Peter

Nabil Faruk · Post by **Nabil Faruk** » Thu Feb 28, 2013 2:47 pm

Thanks

CL_OUT_OF_RESOURCES with RPMD Integrator

CL_OUT_OF_RESOURCES with RPMD Integrator

Re: CL_OUT_OF_RESOURCES with RPMD Integrator

Re: CL_OUT_OF_RESOURCES with RPMD Integrator

Re: CL_OUT_OF_RESOURCES with RPMD Integrator

Re: CL_OUT_OF_RESOURCES with RPMD Integrator