OpenCL platform and multiple devices

Siddharth Srinivasan · Post by **Siddharth Srinivasan** » Thu Jul 04, 2013 7:20 pm

Hi all

I have seen a few threads regarding this issue, like https://simtk.org/forums/viewtopic.php? ... vice#p7862 for example, but I still have an issue. I am using OpenMM 4.1, so let me know if this is the problem.

I have a node with 2 GPUs, and am trying out the OpenCL platform. Running nvidia-smi gives me

Code: Select all

siddharth@node086 ~ $ nvidia-smi
Thu Jul  4 18:58:29 2013       
+------------------------------------------------------+                       
| NVIDIA-SMI 2.285.05   Driver Version: 285.05.33      |                       
|-------------------------------+----------------------+----------------------+
| Nb.  Name                     | Bus Id        Disp.  | Volatile ECC SB / DB |
| Fan   Temp   Power Usage /Cap | Memory Usage         | GPU Util. Compute M. |
|===============================+======================+======================|
| 0.  Tesla T10 Processor       | 0000:07:00.0  Off    |       N/A        N/A |
|  N/A   47 C  P0    Off /  Off |   7%  287MB / 4095MB |  100%     E. Thread  |
|-------------------------------+----------------------+----------------------|
| 1.  Tesla T10 Processor       | 0000:09:00.0  Off    |       N/A        N/A |
|  N/A   47 C  P0    Off /  Off |   1%   54MB / 4095MB |    0%     E. Thread  |
|-------------------------------+----------------------+----------------------|
| Compute processes:                                               GPU Memory |
|  GPU  PID     Process name                                       Usage      |
|=============================================================================|
|  0.  26897    ZymeCAD                                                329MB  |
|  1.  26897    ZymeCAD                                                329MB  |
+-----------------------------------------------------------------------------+

and I guess this goes along with the thread explanation that the context is created across all GPUs. However it looks like the same process is spanning both GPUs. This is a problem for our scheduler, which thinks that only one GPU has been used, since that is what I requested. It then schedules another job on the same node, which fails immediately with the error

Code: Select all

2terminate called after throwing an instance of 'OpenMM::OpenMMException'
2what():  Error initializing context: clCreateContextFromType (-2)
2[node086:26932] *** Process received signal ***
2[node086:26932] Signal: Aborted (6)
2[node086:26932] Signal code:  (-6)
2[node086:26932] [ 0] /lib64/libpthread.so.0() [0x316040f500]
2[node086:26932] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x315fc328a5]
2[node086:26932] [ 2] /lib64/libc.so.6(abort+0x175) [0x315fc34085]
2[node086:26932] [ 3] /usr/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x12d) [0x3161cbea7d]
2[node086:26932] [ 4] /usr/lib64/libstdc++.so.6() [0x3161cbcc06]
2[node086:26932] [ 5] /usr/lib64/libstdc++.so.6() [0x3161cbcc33]
2[node086:26932] [ 6] /usr/lib64/libstdc++.so.6() [0x3161cbcd2e]
2[node086:26932] [ 7] /usr/local/openmm/lib/plugins/libOpenMMOpenCL.so(_ZN6OpenMM13OpenCLContextC1EiiiRNS_14OpenCLPlatform12PlatformDataE+0x3636) [0x2b1cbae23286]
2[node086:26932] [ 8] /usr/local/openmm/lib/plugins/libOpenMMOpenCL.so(_ZN6OpenMM14OpenCLPlatform12PlatformDataC1EiRKSsS3_+0x693) [0x2b1cbae31a73]
2[node086:26932] [ 9] /usr/local/openmm/lib/plugins/libOpenMMOpenCL.so(_ZNK6OpenMM14OpenCLPlatform14contextCreatedERNS_11ContextImplERKSt3mapISsSsSt4lessISsESaISt4pairIKSsSsEEE+0x2af) [0x2b1cbae320af]
2[node086:26932] [10] /usr/local/openmm/lib/libOpenMM.so(_ZN6OpenMM11ContextImplC1ERNS_7ContextERNS_6SystemERNS_10IntegratorEPNS_8PlatformERKSt3mapISsSsSt4lessISsESaISt4pairIKSsSsEEE+0x899) [0x2b1ca3f109d9]
2[node086:26932] [11] /usr/local/openmm/lib/libOpenMM.so(_ZN6OpenMM7ContextC1ERNS_6SystemERNS_10IntegratorERNS_8PlatformE+0x81) [0x2b1ca3f0bb91]

indicating that it somehow conflicted with the job already running on it. From the threads it seemed like choosing a particular device would result in the same behaviour. What can I do about this? I need to have 2 simulations running on this node.

I have so far been using the CUDA platform where this issue does not exist.

Peter Eastman · Post by **Peter Eastman** » Fri Jul 05, 2013 10:32 am

Hi Siddharth,

This problem was fixed a long time ago. Upgrade to a newer version of OpenMM and it should be fine.

The fix was to specify only the particular device being used when creating a cl::Context. Here's the relevant code from the current version of OpenCLContext.cpp:

Code: Select all

    vector<cl::Device> contextDevices;
    contextDevices.push_back(device);
    cl_context_properties cprops[] = {CL_CONTEXT_PLATFORM, (cl_context_properties) platforms[platformIndex](), 0};
    context = cl::Context(contextDevices, cprops, errorCallback);

Peter

Siddharth Srinivasan · Post by **Siddharth Srinivasan** » Fri Jul 05, 2013 10:36 am

Thanks Peter!

Siddharth Srinivasan · Post by **Siddharth Srinivasan** » Tue Jul 23, 2013 3:44 pm

Hi Peter

I upgraded to OpenMM 5.1.1, and at least both simulations seem to be running on the same node. I notice that both these are significantly slower than the equivalent simulation on a single GPU however. nvidia-smi shows

Tue Jul 23 15:37:43 2013
+------------------------------------------------------+
| NVIDIA-SMI 4.304.54 Driver Version: 304.54 |
|-------------------------------+----------------------+----------------------+
| GPU Name | Bus-Id Disp. | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T10 Processor | 0000:07:00.0 Off | N/A |
| N/A 61C P0 N/A / N/A | 53% 2173MB / 4095MB | 98% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T10 Processor | 0000:09:00.0 Off | N/A |
| N/A 61C P8 N/A / N/A | 0% 3MB / 4095MB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 29792 ZymeCAD 1081MB |
| 0 29848 ZymeCAD 1081MB |
+-----------------------------------------------------------------------------+

It looks like one simulation is consuming most of the resources, the other one is sitting idle. Also, both simulations are using GPU 0.

When I create the context, I do not specify any particular device index, I just let it choose the defaults. In the CUDA platform, as long as I set the CUDA devices to be in "Exclusive" mode through nvidia-smi, I was guaranteed that one or the other GPU would be chosen.For the OpenCL platform, what is the default behaviour? For a host with 2 GPUs if I specify nothing, will it try to run one simulation on both GPUs? WIll it choose one GPU randomly, and somehow know if the other one is free?

Peter Eastman · Post by **Peter Eastman** » Tue Jul 23, 2013 4:55 pm

Hi Siddharth,

The logic for automatically picking a device is entirely based on the properties of the device. It has no idea what else is running on a particular device at the same time. So if you start two simulations and let each one pick the device automatically, both of them will pick the same device. Not too useful in your case!

So you should explicitly set CudaDeviceIndex or OpenCLDeviceIndex, and specify a different device for each simulation.

Peter

Peter Eastman · Post by **Peter Eastman** » Tue Jul 23, 2013 4:55 pm

Hi Siddharth,

The logic for automatically picking a device is entirely based on the properties of the device. It has no idea what else is running on a particular device at the same time. So if you start two simulations and let each one pick the device automatically, both of them will pick the same device. Not too useful in your case!

So you should explicitly set CudaDeviceIndex or OpenCLDeviceIndex, and specify a different device for each simulation.

Peter

Peter Eastman · Post by **Peter Eastman** » Tue Jul 23, 2013 4:55 pm

Hi Siddharth,

The logic for automatically picking a device is entirely based on the properties of the device. It has no idea what else is running on a particular device at the same time. So if you start two simulations and let each one pick the device automatically, both of them will pick the same device. Not too useful in your case!

So you should explicitly set CudaDeviceIndex or OpenCLDeviceIndex, and specify a different device for each simulation.

Peter

Siddharth Srinivasan · Post by **Siddharth Srinivasan** » Tue Jul 23, 2013 5:47 pm

Thats what I figured, thanks. So by not specifying anything, its not like it runs on both GPUs thereby parallelizing the application further, it just runs on one GPU, is that correct? To parallelize across multiple GPUs I would explicitly have to call setDeviceIndex([0,1]) or the equivalent?

Siddharth Srinivasan · Post by **Siddharth Srinivasan** » Wed Jul 24, 2013 9:11 am

Also, is there a particular device ID or funciton I can call to explicitly run OpenCL code on the CPU, not the GPU? I have a node with 2 GPUs, and the device IDs are
0 - GPU 1
1 - GPU 2
So would the CPU be device ID 3? I don't want to use the Reference platform, rather the OpenCL platform on the CPU.

Siddharth Srinivasan · Post by **Siddharth Srinivasan** » Wed Jul 24, 2013 10:06 am

Never mind, I found my answer at https://simtk.org/forums/viewtopic.php? ... +GPU#p9387. Thanks!

OpenCL platform and multiple devices

OpenCL platform and multiple devices

Re: OpenCL platform and multiple devices

Re: OpenCL platform and multiple devices

Re: OpenCL platform and multiple devices

Re: OpenCL platform and multiple devices

Re: OpenCL platform and multiple devices

Re: OpenCL platform and multiple devices

Re: OpenCL platform and multiple devices

Re: OpenCL platform and multiple devices

Re: OpenCL platform and multiple devices