1. When I specify more than 450 copies for RPMD integrator (no matter the number of atoms), I get the following error during integration with the OpenCL platform:
Code: Select all
OpenCL internal error: CL_OUT_OF_RESOURCES error executing CL_COMMAND_NDRANGE_KERNEL on Tesla M2070 (Device 0)
2. It seems that the copies are handled sequentially instead of in parallel because with a small system of only a few atoms the reference platform is faster than OpenCL, even with 100's of copies. Only if the number of atoms are increased does the OpenCL platform win out. In OpenCLRpmdKernels.cpp (OpenMM 4.1.1) starting from line 125 I see:
Code: Select all
// Loop over copies and compute the force on each one.
copyToContextKernel.setArg<cl::Buffer>(0, positions->getDeviceBuffer());
copyToContextKernel.setArg<cl::Buffer>(1, cl.getPosq().getDeviceBuffer());
copyToContextKernel.setArg<cl::Buffer>(2, cl.getAtomIndex().getDeviceBuffer());
copyFromContextKernel.setArg<cl::Buffer>(0, cl.getForce().getDeviceBuffer());
copyFromContextKernel.setArg<cl::Buffer>(1, forces->getDeviceBuffer());
copyFromContextKernel.setArg<cl::Buffer>(2, cl.getAtomIndex().getDeviceBuffer());
if (!forcesAreValid)
computeForces(context);
Thanks,