OpenCL platform and multiple devices

The functionality of OpenMM will (eventually) include everything that one would need to run modern molecular simulation.
User avatar
Peter Eastman
Posts: 2541
Joined: Thu Aug 09, 2007 1:25 pm

Re: OpenCL platform and multiple devices

Post by Peter Eastman » Wed Jul 24, 2013 10:08 am

To parallelize across multiple GPUs I would explicitly have to call setDeviceIndex([0,1]) or the equivalent?
Correct.
So would the CPU be device ID 3?
That depends on the OpenCL implementation. Nvidia's version of OpenCL doesn't support running on the CPU at all. On the other hand, AMD's and Intel's versions do, and they present the CPU as another device.

Peter

User avatar
Siddharth Srinivasan
Posts: 223
Joined: Thu Feb 12, 2009 6:49 pm

Re: OpenCL platform and multiple devices

Post by Siddharth Srinivasan » Wed Jul 24, 2013 12:04 pm

I write a simple OpenCL program to check the device IDs on my platform:

Code: Select all

            cl_device_id devices[100];
            cl_uint devices_n = 0;
            CL_CHECK(clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU, 100, devices, &devices_n));

            printf("=== %d OpenCL device(s) found on platform:\n", devices_n);
            for (i=0; i<devices_n; i++)
            {
                char buffer[10240];
                cl_uint buf_uint;
                cl_ulong buf_ulong;
                printf("   -- OpenCL DEVICE -- %i\n", devices[i]);
                CL_CHECK(clGetDeviceInfo(devices[i], CL_DEVICE_NAME, sizeof(buffer), buffer, NULL));
            }
The results are
-- OpenCL DEVICE -- 28528320
-- OpenCL DEVICE -- 28528432
and these seem to be randomly generated. How does this translate to the 0 and 1 that OpenMM would expect for setDeviceId()?

User avatar
Peter Eastman
Posts: 2541
Joined: Thu Aug 09, 2007 1:25 pm

Re: OpenCL platform and multiple devices

Post by Peter Eastman » Wed Jul 24, 2013 2:29 pm

OpenCLDeviceIndex refers to the index into the list of devices: i in your code. The numeric value of a cl_device_id has no meaning. Just think of them as opaque handles.

Peter

User avatar
Siddharth Srinivasan
Posts: 223
Joined: Thu Feb 12, 2009 6:49 pm

Re: OpenCL platform and multiple devices

Post by Siddharth Srinivasan » Wed Jul 24, 2013 4:21 pm

Thanks Peter, that answers all my questions. I still have a problem with switching to OpenCL, but I think that is beyond OpenMM's scope.

Consider a node with 2 GPUs. I rely on my cluster management software to schedule at maximum 2 MD jobs on this node, using a resource pool of 2 GPU tokens (the token count reduces by one for every MD job scheduled). Now I can get OpenCL to work by
* choosing device '0' if both GPUs are free (tokens = 2)
* choosing device '1' if one GPU is free and one is busy (tokens = 1)

But consider the case where the first job finishes. Now the scheduler has one token to spare, and assigns another MD job. However it will schedule it on device '1', resulting in a conflict, since there is no way for it to know which particular device to use.

Like I said, its beyond OpenMM's scope here, but if you have some programmatic way to handle this please let me know! For the CUDA platform this was never a concern since I just CUDA handle which device to put it on. In "Exclusive" mode CUDA always uses the free GPU.

User avatar
Peter Eastman
Posts: 2541
Joined: Thu Aug 09, 2007 1:25 pm

Re: OpenCL platform and multiple devices

Post by Peter Eastman » Wed Jul 24, 2013 4:43 pm

How about creating a file on disk to indicate that a particular GPU is in use? Create it when you start the simulation and delete it when you're done. The one downside is that if a simulation crashes or gets killed prematurely, you'll have to go in and delete the file by hand.

As another option, you could have your program launch nvidia-smi and parse the result to see which GPU is in use.

Peter

User avatar
Siddharth Srinivasan
Posts: 223
Joined: Thu Feb 12, 2009 6:49 pm

Re: OpenCL platform and multiple devices

Post by Siddharth Srinivasan » Wed Jul 24, 2013 4:54 pm

I thought about nvidia-smi myself, but is that not CUDA specific? I mean is there a guarantee that device 0 from nvidia-smi is the same as device 0 from the OpenCL API?

User avatar
Peter Eastman
Posts: 2541
Joined: Thu Aug 09, 2007 1:25 pm

Re: OpenCL platform and multiple devices

Post by Peter Eastman » Wed Jul 24, 2013 5:13 pm

I don't know. But it seems likely they would be consistent. Remember, this is Nvidia's implementation of the OpenCL API you're talking about, and there's a good chance they've defined their devices in a consistent order everywhere.

Peter

POST REPLY